Re: [ada, build] host/target configuration
> So, your case works because the manu/osys parsing wrongly detects/assigns > a manufacturer »linux« and an operating system androideabi. Then, the > following case fails, which is expected to yield identical results, with > "complete triplets" -- which I took for granted in my reasoning about the > Makefile code: > > $ make target_alias=arm-unknown-linux-androideabi > target_alias = »arm-unknown-linux-androideabi« > targ = »arm unknown linux androideabi« > arch = »arm« > manu = »unknown« > osys = »linux« > not matched > > > My suggested change would make all these work -- however I have not yet > had the time to fully digest your other emails with the reasoning that > you need configure GCC with non-canonical target and target_alias set > differently. The whole discussion started from wrong premises since, contrary to what the ChangeLog says, neither Pascal nor I have nothing to do with the original, problematic change (see PR ada/57188 for my take on it). We all agree that the mess should be fixed somehow or other and Olivier is working on it. -- Eric Botcazou
Re: Patch ping - Add a new option "-fstack-protector-strong"
On Fri, 26 Apr 2013, Han Shen(沈涵) wrote: > Hi, I'd like to ping the patch '-fstack-protector-strong': > > - http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00945.html > Add a new option '-fstack-protector-strong' to protect only > stack-smashing-vulnerable functions. I see this is now in? Can you please propose some wording (ideally a patch) for http://gcc.gnu.org/gcc-4.9/changes.html ? (http://gcc.gnu.org/projects/web.html has some more on our web pages.) Gerald
Re: [PATCH RX] Added target specific macros for macros for RX100, RX200, and RX600
On Thu, 2 May 2013, Sandeep Kumar Singh wrote: > 2013-05-02 Sandeep Kumar Singh > > * rx/rx.h (TARGET_CPU_CPP_BUILTINS): Add macros for RX100, RX200, and > RX600. > * rx/rx.opt: Add macro for rx100 with string rx100 and value RX100. > * rx/rx-opts.h (rx_cpu_types): Add new cpu type rx100. > * rx/t-rx: Add rx100 under multi library matches option for nofpu > option. Mind also documenting this on http://gcc.gnu.org/gcc-4.9/changes.html ? Let me know if you need help with the web pages. Gerald
Re: [ada, build] host/target configuration
On May 31, 2013, Olivier Hainque wrote: > - revert to our former computations, based on target and >not target_alias. Revert the subsequent adjustments as >well. *nod* > - Use target_alias explicitly just at the points where >we know that we need to depart from the canonical name I suggest another approach: if there are significant differences between the run-time systems, they ought to be preserved in the canonical target names. So, adjust config.sub so that it preserve them, and then we can decide based on the canonical target name only. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
[PATCH] Basic support for MIPS r5900
Hello, after some months I reworked the patch for r5900. It would be nice if this could be accepted. The patch contains only changes to get basic support for MIPS r5900. It can be used to compile a working Linux kernel for the Playstation 2. It is also possible to get Linux programs working with software floating point and ABI o32. Other stuff like hardware floating point and ABI n32 is not fully supported yet. How much other changes will be currently accepted here? There is other stuff which I want to prepare and submit here, e.g.: 1. disable use of dmult and ddiv (ABI n32). 2. use trunc.w.s instead of cvt.w.s (to get single float working for normal range calculations; i.e. calculating without inf or nan). 3. fix use of ll/sc in libgomp, either increase mips ISA level or use syscall (which is broken in Linux 2.6.35.4). 4. fix libgcc to build a real muldi3 function for ABI n32 (not the multi3 function which is stored in muldi3.o file). 5. add support for configure parameters --float=single and --float=double in addition to --float=soft and --float=hard. 6. rework floating point to support single float with ABI n32 (either break the ABI or store floating point values in general purpose registers like soft float). 7. change libgcc or mips.md in way so that the non IEEE 754 compatible FPU of the r5900 gets compatible. Best regards Jürgen--- gcc/libgcc/config.host (Revision 199343) +++ gcc/libgcc/config.host (Arbeitskopie) @@ -739,7 +739,17 @@ ;; mips*-*-linux*)# Linux MIPS, either endian. extra_parts="$extra_parts crtfastmath.o" - tmake_file="${tmake_file} t-crtfm mips/t-mips16" + tmake_file="${tmake_file} t-crtfm" + # Check for MicroMIPS support. + case ${host} in + mips64r5900* | mipsr5900*) + # MicroMIPS uses floating point instructions + # which are not supported on r5900. + ;; + *) + tmake_file="${tmake_file} mips/t-mips16" + ;; + esac md_unwind_header=mips/linux-unwind.h if test "${ac_cv_sizeof_long_double}" = 16; then tmake_file="${tmake_file} mips/t-tpbit" @@ -777,10 +787,18 @@ tmake_file="$tmake_file mips/t-elf mips/t-crtstuff mips/t-mips16" extra_parts="$extra_parts crti.o crtn.o" ;; +mipsr5900-*-elf* | mipsr5900el-*-elf*) + tmake_file="$tmake_file mips/t-elf mips/t-crtstuff" + extra_parts="$extra_parts crti.o crtn.o" + ;; mips64-*-elf* | mips64el-*-elf*) tmake_file="$tmake_file mips/t-elf mips/t-crtstuff mips/t-mips16" extra_parts="$extra_parts crti.o crtn.o" ;; +mips64r5900-*-elf* | mips64r5900el-*-elf*) + tmake_file="$tmake_file mips/t-elf mips/t-crtstuff" + extra_parts="$extra_parts crti.o crtn.o" + ;; mips64vr-*-elf* | mips64vrel-*-elf*) tmake_file="$tmake_file mips/t-elf mips/t-vr mips/t-crtstuff" extra_parts="$extra_parts crti.o crtn.o" --- gcc/gcc/config.gcc (Revision 199343) +++ gcc/gcc/config.gcc (Arbeitskopie) @@ -1937,10 +1937,16 @@ target_cpu_default="MASK_64BIT|MASK_FLOAT64" tm_defines="${tm_defines} MIPS_ISA_DEFAULT=64 MIPS_CPU_STRING_DEFAULT=\\\"sb1\\\" MIPS_ABI_DEFAULT=ABI_O64" ;; -mips-*-elf* | mipsel-*-elf*) +mips-*-elf* | mipsel-*-elf* | mipsr5900-*-elf* | mipsr5900el-*-elf*) tm_file="elfos.h newlib-stdint.h ${tm_file} mips/elf.h" tmake_file="mips/t-elf" ;; +mips64r5900-*-elf* | mips64r5900el-*-elf*) + tm_file="elfos.h newlib-stdint.h ${tm_file} mips/elf.h" + tmake_file="mips/t-elf" + target_cpu_default="MASK_64BIT" + tm_defines="${tm_defines} MIPS_ISA_DEFAULT=3 MIPS_ABI_DEFAULT=ABI_N32" + ;; mips64-*-elf* | mips64el-*-elf*) tm_file="elfos.h newlib-stdint.h ${tm_file} mips/elf.h" tmake_file="mips/t-elf" @@ -2973,6 +2979,19 @@ ;; esac ;; +mips64r5900-*-*|mips64r5900el-*-*|mipsr5900-*-*|mipsr5900el-*-*) + with_arch=r5900 + with_tune=r5900 + if test x$with_llsc = x; then + # r5900 doesn't support ll, sc, lld and scd instructions: + with_llsc=no + fi + if test x$with_float = x; then + # r5900 doesn't support 64 bit float: + # 32 bit float doesn't comply with IEEE 754. + with_float=soft + fi + ;; mips*-*-vxworks) with_arch=mips2 ;; --- gcc/gcc/config/mips/mips.c (Revision 199343) +++ gcc/gcc/config/mips/mips.c (Arbeitskopie) @@ -1029,6 +1029,19 @@ 1, /* branch_cost */ 4/* memory_latency */ }, + { /* R5900 */ +COSTS_N_INSNS (4),/* fp_add */ +COSTS_N_INSNS (4),/* fp_mult_sf */ +COSTS_N_INSNS (256), /* fp_mult_df */ +COSTS_N_INSNS (8),/* fp_div_sf */ +COSTS_N_INSNS (256), /* fp_div_df */ +COSTS_N_INSNS (4),/* int_mult_si */ +COSTS_N_INSNS (256), /* int_mult_di */ +COSTS_N_INSNS (37), /* int_div_si */ +COSTS_N_INSNS (256), /* int_div_di */ + 1, /* branch_cost */ + 4/* memory_latency */ + }, { /* R7000 */ /* The only costs that are changed here are integer multiplication. */ @@ -13005,6 +13018,7 @@ case PROCESSOR_R4130:
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
The patch was committed to google-4_8, but it causes problem because einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause recursive inlining at einline stage (e.g. main->foo, foo->bar, bar->foo) when autofdo is enabled. The following patch can fix the problem by doing more targetted early inlining: Index: gcc/predict.c === --- gcc/predict.c (revision 199593) +++ gcc/predict.c (working copy) @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) && !maybe_hot_count_p (NULL, edge->count)) return false; + if (flag_auto_profile) +return false; if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED || (edge->callee && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) Performance testing on-going... Dehao On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: > OK, I'll commit the early inline part. > > Dehao > > On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li > wrote: >> The early inlining part is ok. The tracer optimization should be >> revisited -- we should have more fine grain control on it (for >> instance, based on FDO summary -- but that should be common to >> FDO/LIPO). >> >> David >> >> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: >>> In gcc4-8, the max einline iterations are restricted to 1. For >>> AutoFDO, this is bad because early inline is not size restricted. This >>> patch allows einline to do multiple iterations in AutoFDO. It also >>> enables tracer optimization in AutoFDO. >>> >>> Bootstrapped and passed regression test. >>> >>> OK for googel-4_8? >>> >>> Thanks, >>> Dehao >>> >>> Index: gcc/ipa-inline.c >>> === >>> --- gcc/ipa-inline.c (revision 199416) >>> +++ gcc/ipa-inline.c (working copy) >>> @@ -2161,7 +2161,8 @@ early_inliner (void) >>> { >>>/* We iterate incremental inlining to get trivial cases of indirect >>> inlining. */ >>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) >>> + while ((flag_auto_profile >>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >>> && early_inline_small_functions (node)) >>> { >>>timevar_push (TV_INTEGRATION); >>> Index: gcc/opts.c >>> === >>> --- gcc/opts.c (revision 199416) >>> +++ gcc/opts.c (working copy) >>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >>> opts->x_flag_peel_loops = value; >>>if (!opts_set->x_flag_value_profile_transformations) >>> opts->x_flag_value_profile_transformations = value; >>> + if (!opts_set->x_flag_tracer) >>> + opts->x_flag_tracer = value; >>>if (!opts_set->x_flag_inline_functions) >>> opts->x_flag_inline_functions = value; >>>if (!opts_set->x_flag_ipa_cp)
[PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL
Hi, This patch adds value range information to tree SSA_NAME during Value Range Propagation (VRP) pass in preparation to removes some of the redundant sign/zero extensions during RTL expansion. This is based on the original patch posted in http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00610.html and addresses the review comments of Richard Biener. Tested on X86_64 and ARM. I would like review comments on this. Thanks, Kugan +2013-06-03 Kugan Vivekanandarajah + + * gcc/gcc/tree-flow.h: Declared structure range_info_def and function + definition for mark_range_info_unknown. + * gcc/tree-ssa-alias.c (dump_alias_info) : Check pointer type + * gcc/tree-ssanames.c (make_ssa_name_fn) : Check pointer type in + initialize. + * (mark_range_info_unknown) : New function. + * (duplicate_ssa_name_range_info) : Likewise. + * (duplicate_ssa_name_fn) : Check pointer type and call correct + duplicate function. + * gcc/tree-vrp.c (extract_exp_value_range): New function. + * (simplify_stmt_using_ranges): Call extract_exp_value_range and + tree_ssa_set_value_range. + * gcc/tree.c (tree_ssa_set_value_range): New function. + * gcc/tree.h (SSA_NAME_PTR_INFO) : changed to access via union + * gcc/tree.h (SSA_NAME_RANGE_INFO) : New macro + diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h index 24fcfbf..dd4e2f5 100644 --- a/gcc/tree-flow.h +++ b/gcc/tree-flow.h @@ -147,6 +147,19 @@ struct GTY(()) ptr_info_def unsigned int misalign; }; +/* Value range information for SSA_NAMEs representing non-pointer variables. */ + +struct GTY (()) range_info_def { + /* Set to true if VR_RANGE and false if VR_ANTI_RANGE. */ + bool vr_range; + /* Minmum for value range. */ + double_int min; + /* Maximum for value range. */ + double_int max; + /* Set to true if range is valid. */ + bool valid; +}; + /* It is advantageous to avoid things like life analysis for variables which do not need PHI nodes. This enum describes whether or not a particular @@ -532,6 +545,7 @@ extern void replace_ssa_name_symbol (tree, tree); extern bool get_ptr_info_alignment (struct ptr_info_def *, unsigned int *, unsigned int *); extern void mark_ptr_info_alignment_unknown (struct ptr_info_def *); +extern void mark_range_info_unknown (struct range_info_def *); extern void set_ptr_info_alignment (struct ptr_info_def *, unsigned int, unsigned int); extern void adjust_ptr_info_misalignment (struct ptr_info_def *, diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 2ecd139..8ccecb5 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -404,6 +404,7 @@ dump_alias_info (FILE *file) struct ptr_info_def *pi; if (ptr == NULL_TREE + || !POINTER_TYPE_P (TREE_TYPE (ptr)) || SSA_NAME_IN_FREE_LIST (ptr)) continue; diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c index 0a405ce..420ae00 100644 --- a/gcc/tree-ssanames.c +++ b/gcc/tree-ssanames.c @@ -151,7 +151,11 @@ make_ssa_name_fn (struct function *fn, tree var, gimple stmt) SET_SSA_NAME_VAR_OR_IDENTIFIER (t, var); } SSA_NAME_DEF_STMT (t) = stmt; - SSA_NAME_PTR_INFO (t) = NULL; + if (POINTER_TYPE_P (TREE_TYPE (t))) +SSA_NAME_PTR_INFO (t) = NULL; + else +SSA_NAME_RANGE_INFO (t) = NULL; + SSA_NAME_IN_FREE_LIST (t) = 0; SSA_NAME_IS_DEFAULT_DEF (t) = 0; imm = &(SSA_NAME_IMM_USE_NODE (t)); @@ -266,6 +270,14 @@ mark_ptr_info_alignment_unknown (struct ptr_info_def *pi) pi->misalign = 0; } +/* Set the range described by RI has invalid values. */ + +void +mark_range_info_unknown (struct range_info_def *ri) +{ + ri->valid = false; +} + /* Store the the power-of-two byte alignment and the deviation from that alignment of pointer described by PI to ALIOGN and MISALIGN respectively. */ @@ -359,6 +371,26 @@ duplicate_ssa_name_ptr_info (tree name, struct ptr_info_def *ptr_info) SSA_NAME_PTR_INFO (name) = new_ptr_info; } +/* Creates a duplicate of the range_info_def at RANGE_INFO for use by + the SSA name NAME. */ +void +duplicate_ssa_name_range_info (tree name, struct range_info_def *range_info) +{ + struct range_info_def *new_range_info; + + gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); + gcc_assert (!SSA_NAME_RANGE_INFO (name)); + + if (!range_info) +return; + + new_range_info = ggc_alloc_range_info_def (); + *new_range_info = *range_info; + + SSA_NAME_RANGE_INFO (name) = new_range_info; +} + + /* Creates a duplicate of a ssa name NAME tobe defined by statement STMT in function FN. */ @@ -367,10 +399,20 @@ tree duplicate_ssa_name_fn (struct function *fn, tree name, gimple stmt) { tree new_name = copy_ssa_name_fn (fn, name, stmt); - struct ptr_info_def *old_ptr_info = SSA_NAME_PTR_INFO (name); + if (POINTER_TYPE_P (TREE_TYPE (name))) +{ + struct ptr_info_def *old_ptr_info = SSA_NAME_PTR_INFO (name); + + if (old_ptr_info) +duplica
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
auto profile info is not available yet in early inlining, why would this change make any difference? Can you just reset the max_iters to a higher value for autoFDO? David On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen wrote: > The patch was committed to google-4_8, but it causes problem because > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause > recursive inlining at einline stage (e.g. main->foo, foo->bar, > bar->foo) when autofdo is enabled. > > The following patch can fix the problem by doing more targetted early > inlining: > > Index: gcc/predict.c > === > --- gcc/predict.c (revision 199593) > +++ gcc/predict.c (working copy) > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >&& !maybe_hot_count_p (NULL, > edge->count)) > return false; > + if (flag_auto_profile) > +return false; >if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >|| (edge->callee >&& edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) > > Performance testing on-going... > > Dehao > > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: >> OK, I'll commit the early inline part. >> >> Dehao >> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li >> wrote: >>> The early inlining part is ok. The tracer optimization should be >>> revisited -- we should have more fine grain control on it (for >>> instance, based on FDO summary -- but that should be common to >>> FDO/LIPO). >>> >>> David >>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: In gcc4-8, the max einline iterations are restricted to 1. For AutoFDO, this is bad because early inline is not size restricted. This patch allows einline to do multiple iterations in AutoFDO. It also enables tracer optimization in AutoFDO. Bootstrapped and passed regression test. OK for googel-4_8? Thanks, Dehao Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199416) +++ gcc/ipa-inline.c (working copy) @@ -2161,7 +2161,8 @@ early_inliner (void) { /* We iterate incremental inlining to get trivial cases of indirect inlining. */ - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) + while ((flag_auto_profile + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) && early_inline_small_functions (node)) { timevar_push (TV_INTEGRATION); Index: gcc/opts.c === --- gcc/opts.c (revision 199416) +++ gcc/opts.c (working copy) @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, opts->x_flag_peel_loops = value; if (!opts_set->x_flag_value_profile_transformations) opts->x_flag_value_profile_transformations = value; + if (!opts_set->x_flag_tracer) + opts->x_flag_tracer = value; if (!opts_set->x_flag_inline_functions) opts->x_flag_inline_functions = value; if (!opts_set->x_flag_ipa_cp)
[PATCH][2 of 2] RTL expansion for zero sign extension elimination with VRP
Hi, This patch removes some of the redundant sign/zero extensions using value range information during RTL expansion. When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to RTL, if we can prove that RHS expression value can always fit in LHS type and there is no sign conversion, truncation and extension to fit the type is redundant. For a SUBREG_PROMOTED_VAR_P, Subreg and Zero/sign extensions are therefore redundant. For example, when an expression is evaluated and it's value is assigned to variable of type short, the generated RTL would look something like the following. (set (reg:SI 110) (zero_extend:SI (subreg:HI (reg:SI 117) 0))) However, if during value range propagation, if we can say for certain that the value of the expression which is present in register 117 is within the limits of short and there is no sign conversion, we do not need to perform the subreg and zero_extend; instead we can generate the following RTl. (set (reg:SI 110) (reg:SI 117))) Same could be done for other assign statements. This patch is based on the earlier attempt posted in http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00610.html and addresses the review comments of Richard Biener. I am post-processing the expand_expr_real_2 output in expand_gimple_stmt though. Reason for this is that I would like to process all the possible assignment stmts, not just CASE_CONVERT case and/or the REDUCE_BITFIELD. This change along with expansion improve the geomean of spec2k int benchmark with ref by about ~3.5% on an arm chromebook. Tested on X86_64 and ARM. I would like review comments on this. Thanks, Kugan +2013-06-03 Kugan Vivekanandarajah + + * gcc/dojump.c (do_compare_and_jump): generates rtl without + zero/sign extension if redundant. + * gcc/cfgexpand.c (expand_gimple_stmt_1): Likewise. + * gcc/gimple.c (gimple_assign_is_zero_sign_ext_redundant) : New + function. + * gcc/gimple.h (gimple_assign_is_zero_sign_ext_redundant) : New + function definition. + diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index c187273..ce980bc 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2311,6 +2311,17 @@ expand_gimple_stmt_1 (gimple stmt) if (temp == target) ; +/* If the value in SUBREG of temp fits that SUBREG (does not + overflow) and is assigned to target SUBREG of the same mode + without sign convertion, we can skip the SUBREG + and extension. */ +else if (promoted + && gimple_assign_is_zero_sign_ext_redundant (stmt) + && (GET_CODE (temp) == SUBREG) + && (GET_MODE (target) == GET_MODE (temp)) + && (GET_MODE (SUBREG_REG (target)) + == GET_MODE (SUBREG_REG (temp + emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp)); else if (promoted) { int unsignedp = SUBREG_PROMOTED_UNSIGNED_P (target); diff --git a/gcc/dojump.c b/gcc/dojump.c index 3f04eac..cb13f3a 100644 --- a/gcc/dojump.c +++ b/gcc/dojump.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "ggc.h" #include "basic-block.h" #include "tm_p.h" +#include "gimple.h" static bool prefer_and_bit_test (enum machine_mode, int); static void do_jump_by_parts_greater (tree, tree, int, rtx, rtx, int); @@ -1108,6 +1109,60 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code, type = TREE_TYPE (treeop0); mode = TYPE_MODE (type); + + /* Is zero/sign extension redundant as per VRP. */ + bool op0_ext_redundant = false; + bool op1_ext_redundant = false; + + /* If promoted and the value in SUBREG of op0 fits (does not overflow), + it is a candidate for extension elimination. */ + if (GET_CODE (op0) == SUBREG && SUBREG_PROMOTED_VAR_P (op0)) +op0_ext_redundant = + gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop0)); + + /* If promoted and the value in SUBREG of op1 fits (does not overflow), + it is a candidate for extension elimination. */ + if (GET_CODE (op1) == SUBREG && SUBREG_PROMOTED_VAR_P (op1)) +op1_ext_redundant = + gimple_assign_is_zero_sign_ext_redundant (SSA_NAME_DEF_STMT (treeop1)); + + /* If zero/sign extension is redundant, generate RTL + for operands without zero/sign extension. */ + if ((op0_ext_redundant || TREE_CODE (treeop0) == INTEGER_CST) + && (op1_ext_redundant || TREE_CODE (treeop1) == INTEGER_CST)) +{ + if (TREE_CODE (treeop1) == INTEGER_CST) +{ + /* First operand is constant. */ + rtx new_op0 = gen_reg_rtx (GET_MODE (SUBREG_REG (op0))); + + emit_move_insn (new_op0, SUBREG_REG (op0)); + op0 = new_op0; +} + else if (TREE_CODE (treeop0) == INTEGER_CST) +{ + /* Other operand is constant. */ + rtx new_op1 = gen_reg_rtx
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li wrote: > > auto profile info is not available yet in early inlining, why would > this change make any difference? Because the check of PARAM_EARLY_INLINING_INSNS is after the check of cgraph_maybe_hot_edge_p in early inline. If cgraph_maybe_hot_edge_p fails, the early inline will not happen even if growth is less than PARAM_EARLY_INLINING_INSNS. > > Can you just reset the max_iters to a > higher value for autoFDO? We could do that, but it could still lead to some code bloat because recursive inlines can happen for at most, say 10, iterations. Dehao > > David > > On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen wrote: > > The patch was committed to google-4_8, but it causes problem because > > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause > > recursive inlining at einline stage (e.g. main->foo, foo->bar, > > bar->foo) when autofdo is enabled. > > > > The following patch can fix the problem by doing more targetted early > > inlining: > > > > Index: gcc/predict.c > > === > > --- gcc/predict.c (revision 199593) > > +++ gcc/predict.c (working copy) > > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) > >&& !maybe_hot_count_p (NULL, > > edge->count)) > > return false; > > + if (flag_auto_profile) > > +return false; > >if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED > >|| (edge->callee > >&& edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) > > > > Performance testing on-going... > > > > Dehao > > > > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: > >> OK, I'll commit the early inline part. > >> > >> Dehao > >> > >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li > >> wrote: > >>> The early inlining part is ok. The tracer optimization should be > >>> revisited -- we should have more fine grain control on it (for > >>> instance, based on FDO summary -- but that should be common to > >>> FDO/LIPO). > >>> > >>> David > >>> > >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: > In gcc4-8, the max einline iterations are restricted to 1. For > AutoFDO, this is bad because early inline is not size restricted. This > patch allows einline to do multiple iterations in AutoFDO. It also > enables tracer optimization in AutoFDO. > > Bootstrapped and passed regression test. > > OK for googel-4_8? > > Thanks, > Dehao > > Index: gcc/ipa-inline.c > === > --- gcc/ipa-inline.c (revision 199416) > +++ gcc/ipa-inline.c (working copy) > @@ -2161,7 +2161,8 @@ early_inliner (void) > { > /* We iterate incremental inlining to get trivial cases of > indirect > inlining. */ > - while (iterations < PARAM_VALUE > (PARAM_EARLY_INLINER_MAX_ITERATIONS) > + while ((flag_auto_profile > + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) > && early_inline_small_functions (node)) > { > timevar_push (TV_INTEGRATION); > Index: gcc/opts.c > === > --- gcc/opts.c (revision 199416) > +++ gcc/opts.c (working copy) > @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, > opts->x_flag_peel_loops = value; > if (!opts_set->x_flag_value_profile_transformations) > opts->x_flag_value_profile_transformations = value; > + if (!opts_set->x_flag_tracer) > + opts->x_flag_tracer = value; > if (!opts_set->x_flag_inline_functions) > opts->x_flag_inline_functions = value; > if (!opts_set->x_flag_ipa_cp)
[ACTIVITY] 10-14 May 2013
== Progress == * VRP based zero/sign extension - Tested and posted the latest patch * Better end of loop counter optimisation - Tree level optimization are optimized in mainline - Christophe noted a slight change in asm generated from earlier version - tracked down the patch causing this and communicated this. * Generate a single call to divmod - Looked at expand_divmod to understand how __aeabi_idiv and __aeabi_idivmod are generated. == Plan == * Better end of loop counter optimisation - Change the pattern to remove this additional instruction if necessary. * Generate a single call to divmod - Come up with a solution
Re: [ACTIVITY] 27-31 May 2013
Apologies for sending again. Corrected wrong dates in subject now. On 03/06/13 12:19, Kugan wrote: == Progress == * VRP based zero/sign extension - Tested and posted the latest patch * Better end of loop counter optimisation - Tree level optimization are optimized in mainline - Christophe noted a slight change in asm generated from earlier version - tracked down the patch causing this and communicated this. * Generate a single call to divmod - Looked at expand_divmod to understand how __aeabi_idiv and __aeabi_idivmod are generated. == Plan == * Better end of loop counter optimisation - Change the pattern to remove this additional instruction if necessary. * Generate a single call to divmod - Come up with a solution
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
If the purpose of the fix is to filter early inlinings with code growth in autoFDO, the proposed fix is the wrong way to do -- it changes the meaning of cgraph_maybe_hot_edge_p. David On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen wrote: > On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li wrote: >> >> auto profile info is not available yet in early inlining, why would >> this change make any difference? > > Because the check of PARAM_EARLY_INLINING_INSNS is after the check of > cgraph_maybe_hot_edge_p in early inline. If > cgraph_maybe_hot_edge_p fails, the early inline will not happen even > if growth is less than PARAM_EARLY_INLINING_INSNS. > >> >> Can you just reset the max_iters to a >> higher value for autoFDO? > > We could do that, but it could still lead to some code bloat because > recursive inlines can happen for at most, say 10, iterations. > > Dehao > >> >> David >> >> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen wrote: >> > The patch was committed to google-4_8, but it causes problem because >> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause >> > recursive inlining at einline stage (e.g. main->foo, foo->bar, >> > bar->foo) when autofdo is enabled. >> > >> > The following patch can fix the problem by doing more targetted early >> > inlining: >> > >> > Index: gcc/predict.c >> > === >> > --- gcc/predict.c (revision 199593) >> > +++ gcc/predict.c (working copy) >> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >> >&& !maybe_hot_count_p (NULL, >> > edge->count)) >> > return false; >> > + if (flag_auto_profile) >> > +return false; >> >if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >> >|| (edge->callee >> >&& edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >> > >> > Performance testing on-going... >> > >> > Dehao >> > >> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: >> >> OK, I'll commit the early inline part. >> >> >> >> Dehao >> >> >> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li >> >> wrote: >> >>> The early inlining part is ok. The tracer optimization should be >> >>> revisited -- we should have more fine grain control on it (for >> >>> instance, based on FDO summary -- but that should be common to >> >>> FDO/LIPO). >> >>> >> >>> David >> >>> >> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: >> In gcc4-8, the max einline iterations are restricted to 1. For >> AutoFDO, this is bad because early inline is not size restricted. This >> patch allows einline to do multiple iterations in AutoFDO. It also >> enables tracer optimization in AutoFDO. >> >> Bootstrapped and passed regression test. >> >> OK for googel-4_8? >> >> Thanks, >> Dehao >> >> Index: gcc/ipa-inline.c >> === >> --- gcc/ipa-inline.c (revision 199416) >> +++ gcc/ipa-inline.c (working copy) >> @@ -2161,7 +2161,8 @@ early_inliner (void) >> { >> /* We iterate incremental inlining to get trivial cases of >> indirect >> inlining. */ >> - while (iterations < PARAM_VALUE >> (PARAM_EARLY_INLINER_MAX_ITERATIONS) >> + while ((flag_auto_profile >> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >> && early_inline_small_functions (node)) >> { >> timevar_push (TV_INTEGRATION); >> Index: gcc/opts.c >> === >> --- gcc/opts.c (revision 199416) >> +++ gcc/opts.c (working copy) >> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >> opts->x_flag_peel_loops = value; >> if (!opts_set->x_flag_value_profile_transformations) >> opts->x_flag_value_profile_transformations = value; >> + if (!opts_set->x_flag_tracer) >> + opts->x_flag_tracer = value; >> if (!opts_set->x_flag_inline_functions) >> opts->x_flag_inline_functions = value; >> if (!opts_set->x_flag_ipa_cp)
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
I've updated the patch to check it at ipa-inline: Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199593) +++ gcc/ipa-inline.c (working copy) @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge * if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY)) ; + else if (flag_auto_profile) + { + if (dump_file) +fprintf (dump_file, " will not early inline: %s/%i->%s/%i, " + "call is cold in profiling and code would grow by %i\n", + xstrdup (cgraph_node_name (e->caller)), e->caller->uid, + xstrdup (cgraph_node_name (callee)), callee->uid, + growth); +want_inline = false; + } else if (!cgraph_maybe_hot_edge_p (e)) { if (dump_file) Thanks, Dehao On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li wrote: > If the purpose of the fix is to filter early inlinings with code > growth in autoFDO, the proposed fix is the wrong way to do -- it > changes the meaning of cgraph_maybe_hot_edge_p. > > David > > On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen wrote: >> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li wrote: >>> >>> auto profile info is not available yet in early inlining, why would >>> this change make any difference? >> >> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of >> cgraph_maybe_hot_edge_p in early inline. If >> cgraph_maybe_hot_edge_p fails, the early inline will not happen even >> if growth is less than PARAM_EARLY_INLINING_INSNS. >> >>> >>> Can you just reset the max_iters to a >>> higher value for autoFDO? >> >> We could do that, but it could still lead to some code bloat because >> recursive inlines can happen for at most, say 10, iterations. >> >> Dehao >> >>> >>> David >>> >>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen wrote: >>> > The patch was committed to google-4_8, but it causes problem because >>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause >>> > recursive inlining at einline stage (e.g. main->foo, foo->bar, >>> > bar->foo) when autofdo is enabled. >>> > >>> > The following patch can fix the problem by doing more targetted early >>> > inlining: >>> > >>> > Index: gcc/predict.c >>> > === >>> > --- gcc/predict.c (revision 199593) >>> > +++ gcc/predict.c (working copy) >>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >>> >&& !maybe_hot_count_p (NULL, >>> > edge->count)) >>> > return false; >>> > + if (flag_auto_profile) >>> > +return false; >>> >if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >>> >|| (edge->callee >>> >&& edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >>> > >>> > Performance testing on-going... >>> > >>> > Dehao >>> > >>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: >>> >> OK, I'll commit the early inline part. >>> >> >>> >> Dehao >>> >> >>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li >>> >> wrote: >>> >>> The early inlining part is ok. The tracer optimization should be >>> >>> revisited -- we should have more fine grain control on it (for >>> >>> instance, based on FDO summary -- but that should be common to >>> >>> FDO/LIPO). >>> >>> >>> >>> David >>> >>> >>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: >>> In gcc4-8, the max einline iterations are restricted to 1. For >>> AutoFDO, this is bad because early inline is not size restricted. This >>> patch allows einline to do multiple iterations in AutoFDO. It also >>> enables tracer optimization in AutoFDO. >>> >>> Bootstrapped and passed regression test. >>> >>> OK for googel-4_8? >>> >>> Thanks, >>> Dehao >>> >>> Index: gcc/ipa-inline.c >>> === >>> --- gcc/ipa-inline.c (revision 199416) >>> +++ gcc/ipa-inline.c (working copy) >>> @@ -2161,7 +2161,8 @@ early_inliner (void) >>> { >>> /* We iterate incremental inlining to get trivial cases of >>> indirect >>> inlining. */ >>> - while (iterations < PARAM_VALUE >>> (PARAM_EARLY_INLINER_MAX_ITERATIONS) >>> + while ((flag_auto_profile >>> + || iterations < PARAM_VALUE >>> (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >>> && early_inline_small_functions (node)) >>> { >>> timevar_push (TV_INTEGRATION); >>> Index: gcc/opts.c >>> === >>> --- gcc/opts.c (revision 199416) >>> +++ gcc/opts.c (working copy) >>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >>> opts->x_flag_peel_loops = value; >>> if (!opts_set->x_flag_value_profile_transformations) >>> opts->x_flag_value_profile_transformations = value; >>> >>>
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
The patch is ok if performance test passes. For a complete fix, Is it better to tune down PARAM_EARLY_INLINE_INSNS from 11 to a small value for autoFDO or use a different parameter? David On Sun, Jun 2, 2013 at 9:19 PM, Dehao Chen wrote: > I've updated the patch to check it at ipa-inline: > > Index: gcc/ipa-inline.c > === > --- gcc/ipa-inline.c (revision 199593) > +++ gcc/ipa-inline.c (working copy) > @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge * > >if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY)) > ; > + else if (flag_auto_profile) > + { > + if (dump_file) > +fprintf (dump_file, " will not early inline: %s/%i->%s/%i, " > + "call is cold in profiling and code would grow by %i\n", > + xstrdup (cgraph_node_name (e->caller)), e->caller->uid, > + xstrdup (cgraph_node_name (callee)), callee->uid, > + growth); > +want_inline = false; > + } >else if (!cgraph_maybe_hot_edge_p (e)) > { >if (dump_file) > > Thanks, > Dehao > > On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li wrote: >> If the purpose of the fix is to filter early inlinings with code >> growth in autoFDO, the proposed fix is the wrong way to do -- it >> changes the meaning of cgraph_maybe_hot_edge_p. >> >> David >> >> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen wrote: >>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li >>> wrote: auto profile info is not available yet in early inlining, why would this change make any difference? >>> >>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of >>> cgraph_maybe_hot_edge_p in early inline. If >>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even >>> if growth is less than PARAM_EARLY_INLINING_INSNS. >>> Can you just reset the max_iters to a higher value for autoFDO? >>> >>> We could do that, but it could still lead to some code bloat because >>> recursive inlines can happen for at most, say 10, iterations. >>> >>> Dehao >>> David On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen wrote: > The patch was committed to google-4_8, but it causes problem because > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause > recursive inlining at einline stage (e.g. main->foo, foo->bar, > bar->foo) when autofdo is enabled. > > The following patch can fix the problem by doing more targetted early > inlining: > > Index: gcc/predict.c > === > --- gcc/predict.c (revision 199593) > +++ gcc/predict.c (working copy) > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >&& !maybe_hot_count_p (NULL, > edge->count)) > return false; > + if (flag_auto_profile) > +return false; >if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >|| (edge->callee >&& edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) > > Performance testing on-going... > > Dehao > > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen wrote: >> OK, I'll commit the early inline part. >> >> Dehao >> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li >> wrote: >>> The early inlining part is ok. The tracer optimization should be >>> revisited -- we should have more fine grain control on it (for >>> instance, based on FDO summary -- but that should be common to >>> FDO/LIPO). >>> >>> David >>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen wrote: In gcc4-8, the max einline iterations are restricted to 1. For AutoFDO, this is bad because early inline is not size restricted. This patch allows einline to do multiple iterations in AutoFDO. It also enables tracer optimization in AutoFDO. Bootstrapped and passed regression test. OK for googel-4_8? Thanks, Dehao Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199416) +++ gcc/ipa-inline.c (working copy) @@ -2161,7 +2161,8 @@ early_inliner (void) { /* We iterate incremental inlining to get trivial cases of indirect inlining. */ - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) + while ((flag_auto_profile + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) && early_inline_small_functions (node)) { timevar_push (TV_INTEGRATION); Index: gcc/opts.c ==