On December 14, 2018 5:18:38 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >Richard Biener <rguent...@suse.de> writes: >> On December 12, 2018 7:43:10 PM GMT+01:00, Richard Sandiford ><richard.sandif...@arm.com> wrote: >>>Richard Biener <richard.guent...@gmail.com> writes: >>>> On Thu, Dec 6, 2018 at 2:19 PM Richard Sandiford >>>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and aarch64_be-elf. >>>>> Also repeated the performance testing (but haven't yet tried an >>>>> LTO variant; will do that over the weekend). >>>> >>>> Any results? >>> >>>Sorry, I should've remembered that finding time to run tests is easy, >>>finding time to analyse them is hard. >>> >>>Speed-wise, the impact of the patch for LTO is similar to without, >>>with 554.roms_r being the main beneficiary for both AArch64 and >x86_64. >>>I get a 6.8% improvement on Cortex-A57 with -Ofast -mcpu=native >>>-flto=jobserver. >>> >>>Size-wise, there are three tests that grow by >=2% on x86_64: >>> >>>549.fotonik3d_r: 5.5% >>>548.exchange2_r: 29.5% >>>554.roms_r: 39.6% >> >> Uh. With LTO we might have a reasonable guessed profile and you do >have a optimize_loop_nest_for_speed guard on the transform? > >Guard now added :-) But unfortunately it doesn't make any significant >difference. 548.exchange2_r goes from 29.5% to 27.7%, but the other >two are the same as before. > >> How does compile time fare with the above benchmarks? > >For 554.roms_r it's +80%(!) with -flto=1, but I think that's par for >the course given the increase in function sizes.
:( >For 549.fotonik3d_r it's +5% with -flto=1. > >For 503.bwaves_r (as an example of a benchmark whose size doesn't >change), >the difference is in the noise. > >[...] > >>>You mean something like: >>> >>> real :: foo(:,:), bar(:) >>> >>> do i=1,n >>> do j=1,n >>> foo(i,j) = ... >>> end do >>> bar(i) = .. >>> end do >>> >>>? I can add a test if so. >> >> Please. > >OK, I've added them to loop_versioning_4.f90. > >>>> There may also be some subtle issues with substitute_and_fold being >>>> applied to non-up-to-date SSA form given it folds stmts looking at >>>> (single-use!) SSA edges. The single-use guard might be what saves >>>you >>>> here (SSA uses in the copies are not yet updated to point to the >>>> copied DEFs). >>> >>>OK. I was hoping that because we only apply substitute_and_fold >>>to new code, there would be no problem with uses elsewhere. >> >> Might be, yes. >> >>>Would it be safer to: >>> >>> - version all loops we want to version >>> - update SSA explicitly >>> - apply substitute and fold to all "new" loops >> >> That would be definitely less fishy. But you need to get at the >actual >> 'new' SSA names for the replacements as I guess they'll be rewritten? >> Or maybe those are not. >> >>>? Could we then get away with returning a 0 TODO at the end? >> >> Yes. > >OK, the updated patch does it this way. > >Tested as before. OK. Thanks, Richard. >Thanks, >Richard > > >2018-12-14 Richard Sandiford <richard.sandif...@arm.com> > Ramana Radhakrishnan <ramana.radhakrish...@arm.com> > Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >gcc/ > * doc/invoke.texi (-fversion-loops-for-strides): Document > (loop-versioning-group-size, loop-versioning-max-inner-insns) > (loop-versioning-max-outer-insns): Document new --params. > * Makefile.in (OBJS): Add gimple-loop-versioning.o. > * common.opt (fversion-loops-for-strides): New option. > * opts.c (default_options_table): Enable fversion-loops-for-strides > at -O3. > * params.def (PARAM_LOOP_VERSIONING_GROUP_SIZE) > (PARAM_LOOP_VERSIONING_MAX_INNER_INSNS) > (PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS): New parameters. > * passes.def: Add pass_loop_versioning. > * timevar.def (TV_LOOP_VERSIONING): New time variable. > * tree-ssa-propagate.h > (substitute_and_fold_engine::substitute_and_fold): Add an optional > block parameter. > * tree-ssa-propagate.c > (substitute_and_fold_engine::substitute_and_fold): Likewise. > When passed, only walk blocks dominated by that block. > * tree-vrp.h (range_includes_p): Declare. > (range_includes_zero_p): Turn into an inline wrapper around > range_includes_p. > * tree-vrp.c (range_includes_p): New function, generalizing... > (range_includes_zero_p): ...this. > * tree-pass.h (make_pass_loop_versioning): Declare. > * gimple-loop-versioning.cc: New file. > >gcc/testsuite/ > * gcc.dg/loop-versioning-1.c: New test. > * gcc.dg/loop-versioning-10.c: Likewise. > * gcc.dg/loop-versioning-11.c: Likewise. > * gcc.dg/loop-versioning-2.c: Likewise. > * gcc.dg/loop-versioning-3.c: Likewise. > * gcc.dg/loop-versioning-4.c: Likewise. > * gcc.dg/loop-versioning-5.c: Likewise. > * gcc.dg/loop-versioning-6.c: Likewise. > * gcc.dg/loop-versioning-7.c: Likewise. > * gcc.dg/loop-versioning-8.c: Likewise. > * gcc.dg/loop-versioning-9.c: Likewise. > * gfortran.dg/loop_versioning_1.f90: Likewise. > * gfortran.dg/loop_versioning_2.f90: Likewise. > * gfortran.dg/loop_versioning_3.f90: Likewise. > * gfortran.dg/loop_versioning_4.f90: Likewise. > * gfortran.dg/loop_versioning_5.f90: Likewise. > * gfortran.dg/loop_versioning_6.f90: Likewise. > * gfortran.dg/loop_versioning_7.f90: Likewise. > * gfortran.dg/loop_versioning_8.f90: Likewise. > >Index: gcc/doc/invoke.texi >=================================================================== >--- gcc/doc/invoke.texi 2018-12-11 15:49:19.061544092 +0000 >+++ gcc/doc/invoke.texi 2018-12-14 15:02:53.813883464 +0000 >@@ -8220,7 +8220,8 @@ by @option{-O2} and also turns on the fo > -ftree-partial-pre @gol > -ftree-slp-vectorize @gol > -funswitch-loops @gol >--fvect-cost-model} >+-fvect-cost-model @gol >+-fversion-loops-for-strides} > > @item -O0 > @opindex O0 >@@ -10772,6 +10773,30 @@ of the loop on both branches (modified a > > Enabled by @option{-fprofile-use} and @option{-fauto-profile}. > >+@item -fversion-loops-for-strides >+@opindex fversion-loops-for-strides >+If a loop iterates over an array with a variable stride, create >another >+version of the loop that assumes the stride is always one. For >example: >+ >+@smallexample >+for (int i = 0; i < n; ++i) >+ x[i * stride] = @dots{}; >+@end smallexample >+ >+becomes: >+ >+@smallexample >+if (stride == 1) >+ for (int i = 0; i < n; ++i) >+ x[i] = @dots{}; >+else >+ for (int i = 0; i < n; ++i) >+ x[i * stride] = @dots{}; >+@end smallexample >+ >+This is particularly useful for assumed-shape arrays in Fortran where >+(for example) it allows better vectorization assuming contiguous >accesses. >+ > @item -ffunction-sections > @itemx -fdata-sections > @opindex ffunction-sections >@@ -11981,6 +12006,15 @@ Hardware autoprefetcher scheduler model > Number of lookahead cycles the model looks into; at ' > ' only enable instruction sorting heuristic. > >+@item loop-versioning-max-inner-insns >+The maximum number of instructions that an inner loop can have >+before the loop versioning pass considers it too big to copy. >+ >+@item loop-versioning-max-outer-insns >+The maximum number of instructions that an outer loop can have >+before the loop versioning pass considers it too big to copy, >+discounting any instructions in inner loops that directly benefit >+from versioning. > > @end table > @end table >Index: gcc/Makefile.in >=================================================================== >--- gcc/Makefile.in 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/Makefile.in 2018-12-14 15:02:53.809883498 +0000 >@@ -1320,6 +1320,7 @@ OBJS = \ > gimple-laddress.o \ > gimple-loop-interchange.o \ > gimple-loop-jam.o \ >+ gimple-loop-versioning.o \ > gimple-low.o \ > gimple-pretty-print.o \ > gimple-ssa-backprop.o \ >Index: gcc/common.opt >=================================================================== >--- gcc/common.opt 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/common.opt 2018-12-14 15:02:53.809883498 +0000 >@@ -2775,6 +2775,10 @@ fsplit-loops > Common Report Var(flag_split_loops) Optimization > Perform loop splitting. > >+fversion-loops-for-strides >+Common Report Var(flag_version_loops_for_strides) Optimization >+Version loops based on whether indices have a stride of one. >+ > funwind-tables > Common Report Var(flag_unwind_tables) Optimization > Just generate unwind tables for exception handling. >Index: gcc/opts.c >=================================================================== >--- gcc/opts.c 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/opts.c 2018-12-14 15:02:53.813883464 +0000 >@@ -556,6 +556,7 @@ static const struct default_options defa > { OPT_LEVELS_3_PLUS, OPT_ftree_slp_vectorize, NULL, 1 }, > { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, >{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, >VECT_COST_MODEL_DYNAMIC }, >+ { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 }, > > /* -Ofast adds optimizations to -O3. */ > { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 }, >Index: gcc/params.def >=================================================================== >--- gcc/params.def 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/params.def 2018-12-14 15:02:53.817883430 +0000 >@@ -1365,6 +1365,19 @@ DEFPARAM(PARAM_LOGICAL_OP_NON_SHORT_CIRC > "True if a non-short-circuit operation is optimal.", > -1, -1, 1) > >+DEFPARAM(PARAM_LOOP_VERSIONING_MAX_INNER_INSNS, >+ "loop-versioning-max-inner-insns", >+ "The maximum number of instructions in an inner loop that is being" >+ " considered for versioning.", >+ 200, 0, 0) >+ >+DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS, >+ "loop-versioning-max-outer-insns", >+ "The maximum number of instructions in an outer loop that is being" >+ " considered for versioning, on top of the instructions in inner" >+ " loops.", >+ 100, 0, 0) >+ > /* > > Local variables: >Index: gcc/passes.def >=================================================================== >--- gcc/passes.def 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/passes.def 2018-12-14 15:02:53.817883430 +0000 >@@ -265,6 +265,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_loop_split); >+ NEXT_PASS (pass_loop_versioning); > NEXT_PASS (pass_loop_jam); > /* All unswitching, final value replacement and splitting can expose > empty loops. Remove them now. */ >Index: gcc/timevar.def >=================================================================== >--- gcc/timevar.def 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/timevar.def 2018-12-14 15:02:53.821883396 +0000 >@@ -234,6 +234,7 @@ DEFTIMEVAR (TV_DSE1 , " > DEFTIMEVAR (TV_DSE2 , "dead store elim2") > DEFTIMEVAR (TV_LOOP , "loop analysis") > DEFTIMEVAR (TV_LOOP_INIT , "loop init") >+DEFTIMEVAR (TV_LOOP_VERSIONING , "loop versioning") > DEFTIMEVAR (TV_LOOP_MOVE_INVARIANTS , "loop invariant motion") > DEFTIMEVAR (TV_LOOP_UNROLL , "loop unrolling") > DEFTIMEVAR (TV_LOOP_DOLOOP , "loop doloop") >Index: gcc/tree-ssa-propagate.h >=================================================================== >--- gcc/tree-ssa-propagate.h 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/tree-ssa-propagate.h 2018-12-14 15:02:53.821883396 +0000 >@@ -104,7 +104,7 @@ extern void propagate_tree_value_into_st > virtual bool fold_stmt (gimple_stmt_iterator *) { return false; } > virtual tree get_value (tree) { return NULL_TREE; } > >- bool substitute_and_fold (void); >+ bool substitute_and_fold (basic_block = NULL); > bool replace_uses_in (gimple *); > bool replace_phi_args_in (gphi *); > }; >Index: gcc/tree-ssa-propagate.c >=================================================================== >--- gcc/tree-ssa-propagate.c 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/tree-ssa-propagate.c 2018-12-14 15:02:53.821883396 +0000 >@@ -1154,6 +1154,10 @@ substitute_and_fold_dom_walker::before_d > > > /* Perform final substitution and folding of propagated values. >+ Process the whole function if BLOCK is null, otherwise only >+ process the blocks that BLOCK dominates. In the latter case, >+ it is the caller's responsibility to ensure that dominator >+ information is available and up-to-date. > > PROP_VALUE[I] contains the single value that should be substituted > at every use of SSA name N_I. If PROP_VALUE is NULL, no values are >@@ -1170,16 +1174,24 @@ substitute_and_fold_dom_walker::before_d > Return TRUE when something changed. */ > > bool >-substitute_and_fold_engine::substitute_and_fold (void) >+substitute_and_fold_engine::substitute_and_fold (basic_block block) > { > if (dump_file && (dump_flags & TDF_DETAILS)) >fprintf (dump_file, "\nSubstituting values and folding >statements\n\n"); > > memset (&prop_stats, 0, sizeof (prop_stats)); > >- calculate_dominance_info (CDI_DOMINATORS); >+ /* Don't call calculate_dominance_info when iterating over a >subgraph. >+ Callers that are using the interface this way are likely to want >to >+ iterate over several disjoint subgraphs, and it would be >expensive >+ in enable-checking builds to revalidate the whole dominance tree >+ each time. */ >+ if (block) >+ gcc_assert (dom_info_state (CDI_DOMINATORS)); >+ else >+ calculate_dominance_info (CDI_DOMINATORS); > substitute_and_fold_dom_walker walker (CDI_DOMINATORS, this); >- walker.walk (ENTRY_BLOCK_PTR_FOR_FN (cfun)); >+ walker.walk (block ? block : ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > /* We cannot remove stmts during the BB walk, especially not release > SSA names there as that destroys the lattice of our callers. >Index: gcc/tree-vrp.h >=================================================================== >--- gcc/tree-vrp.h 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/tree-vrp.h 2018-12-14 15:02:53.821883396 +0000 >@@ -243,7 +243,7 @@ struct assert_info > extern void register_edge_assert_for (tree, edge, enum tree_code, > tree, tree, vec<assert_info> &); > extern bool stmt_interesting_for_vrp (gimple *); >-extern bool range_includes_zero_p (const value_range_base *); >+extern bool range_includes_p (const value_range_base *, >HOST_WIDE_INT); > extern bool infer_value_range (gimple *, tree, tree_code *, tree *); > > extern bool vrp_bitmap_equal_p (const_bitmap, const_bitmap); >@@ -285,4 +285,12 @@ extern tree get_single_symbol (tree, boo > extern void maybe_set_nonzero_bits (edge, tree); >extern value_range_kind determine_value_range (tree, wide_int *, >wide_int *); > >+/* Return TRUE if *VR includes the value zero. */ >+ >+inline bool >+range_includes_zero_p (const value_range_base *vr) >+{ >+ return range_includes_p (vr, 0); >+} >+ > #endif /* GCC_TREE_VRP_H */ >Index: gcc/tree-vrp.c >=================================================================== >--- gcc/tree-vrp.c 2018-12-07 14:59:04.255508495 +0000 >+++ gcc/tree-vrp.c 2018-12-14 15:02:53.821883396 +0000 >@@ -1173,15 +1173,14 @@ value_inside_range (tree val, tree min, > } > > >-/* Return TRUE if *VR includes the value zero. */ >+/* Return TRUE if *VR includes the value X. */ > > bool >-range_includes_zero_p (const value_range_base *vr) >+range_includes_p (const value_range_base *vr, HOST_WIDE_INT x) > { > if (vr->varying_p () || vr->undefined_p ()) > return true; >- tree zero = build_int_cst (vr->type (), 0); >- return vr->may_contain_p (zero); >+ return vr->may_contain_p (build_int_cst (vr->type (), x)); > } > >/* If *VR has a value range that is a single constant value return >that, >Index: gcc/tree-pass.h >=================================================================== >--- gcc/tree-pass.h 2018-12-06 18:00:29.000000000 +0000 >+++ gcc/tree-pass.h 2018-12-14 15:02:53.821883396 +0000 >@@ -362,6 +362,7 @@ extern gimple_opt_pass *make_pass_fix_lo > extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt); >+extern gimple_opt_pass *make_pass_loop_versioning (gcc::context >*ctxt); > extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt); > extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); >Index: gcc/gimple-loop-versioning.cc >=================================================================== >--- /dev/null 2018-11-29 13:15:04.463550658 +0000 >+++ gcc/gimple-loop-versioning.cc 2018-12-14 15:02:53.813883464 +0000 >@@ -0,0 +1,1758 @@ >+/* Loop versioning pass. >+ Copyright (C) 2018 Free Software Foundation, Inc. >+ >+This file is part of GCC. >+ >+GCC is free software; you can redistribute it and/or modify it >+under the terms of the GNU General Public License as published by the >+Free Software Foundation; either version 3, or (at your option) any >+later version. >+ >+GCC is distributed in the hope that it will be useful, but WITHOUT >+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >+for more details. >+ >+You should have received a copy of the GNU General Public License >+along with GCC; see the file COPYING3. If not see >+<http://www.gnu.org/licenses/>. */ >+ >+#include "config.h" >+#include "system.h" >+#include "coretypes.h" >+#include "backend.h" >+#include "tree.h" >+#include "gimple.h" >+#include "gimple-iterator.h" >+#include "tree-pass.h" >+#include "gimplify-me.h" >+#include "cfgloop.h" >+#include "tree-ssa-loop.h" >+#include "ssa.h" >+#include "tree-scalar-evolution.h" >+#include "tree-chrec.h" >+#include "tree-ssa-loop-ivopts.h" >+#include "fold-const.h" >+#include "tree-ssa-propagate.h" >+#include "tree-inline.h" >+#include "domwalk.h" >+#include "alloc-pool.h" >+#include "vr-values.h" >+#include "gimple-ssa-evrp-analyze.h" >+#include "tree-vectorizer.h" >+#include "omp-general.h" >+#include "predict.h" >+#include "tree-into-ssa.h" >+#include "params.h" >+ >+namespace { >+ >+/* This pass looks for loops that could be simplified if certain loop >+ invariant conditions were true. It is effectively a form of loop >+ splitting in which the pass produces the split conditions itself, >+ instead of using ones that are already present in the IL. >+ >+ Versioning for when strides are 1 >+ --------------------------------- >+ >+ At the moment the only thing the pass looks for are memory >references >+ like: >+ >+ for (auto i : ...) >+ ...x[i * stride]... >+ >+ It considers changing such loops to: >+ >+ if (stride == 1) >+ for (auto i : ...) [A] >+ ...x[i]... >+ else >+ for (auto i : ...) [B] >+ ...x[i * stride]... >+ >+ This can have several benefits: >+ >+ (1) [A] is often easier or cheaper to vectorize than [B]. >+ >+ (2) The scalar code in [A] is simpler than the scalar code in [B] >+ (if the loops cannot be vectorized or need an epilogue loop). >+ >+ (3) We might recognize [A] as a pattern, such as a memcpy or >memset. >+ >+ (4) [A] has simpler address evolutions, which can help other passes >+ like loop interchange. >+ >+ The optimization is particularly useful for assumed-shape arrays in >+ Fortran, where the stride of the innermost dimension depends on the >+ array descriptor but is often equal to 1 in practice. For example: >+ >+ subroutine f1(x) >+ real :: x(:) >+ x(:) = 100 >+ end subroutine f1 >+ >+ generates the equivalent of: >+ >+ raw_stride = *x.dim[0].stride; >+ stride = raw_stride != 0 ? raw_stride : 1; >+ x_base = *x.data; >+ ... >+ tmp1 = stride * S; >+ tmp2 = tmp1 - stride; >+ *x_base[tmp2] = 1.0e+2; >+ >+ but in the common case that stride == 1, the last three statements >+ simplify to: >+ >+ tmp3 = S + -1; >+ *x_base[tmp3] = 1.0e+2; >+ >+ The optimization is in principle very simple. The difficult parts >are: >+ >+ (a) deciding which parts of a general address calculation >correspond >+ to the inner dimension of an array, since this usually isn't >explicit >+ in the IL, and for C often isn't even explicit in the source >code >+ >+ (b) estimating when the transformation is worthwhile >+ >+ Structure >+ --------- >+ >+ The pass has four phases: >+ >+ (1) Walk through the statements looking for and recording potential >+ versioning opportunities. Stop if there are none. >+ >+ (2) Use context-sensitive range information to see whether any >versioning >+ conditions are impossible in practice. Remove them if so, and >stop >+ if no opportunities remain. >+ >+ (We do this only after (1) to keep compile time down when no >+ versioning opportunities exist.) >+ >+ (3) Apply the cost model. Decide which versioning opportunities >are >+ worthwhile and at which nesting level they should be applied. >+ >+ (4) Attempt to version all the loops selected by (3), so that: >+ >+ for (...) >+ ... >+ >+ becomes: >+ >+ if (!cond) >+ for (...) // Original loop >+ ... >+ else >+ for (...) // New loop >+ ... >+ >+ Use the version condition COND to simplify the new loop. */ >+ >+/* Enumerates the likelihood that a particular value indexes the inner >+ dimension of an array. */ >+enum inner_likelihood { >+ INNER_UNLIKELY, >+ INNER_DONT_KNOW, >+ INNER_LIKELY >+}; >+ >+/* Information about one term of an address_info. */ >+struct address_term_info >+{ >+ /* The value of the term is EXPR * MULTIPLIER. */ >+ tree expr; >+ unsigned HOST_WIDE_INT multiplier; >+ >+ /* The stride applied by EXPR in each iteration of some unrecorded >loop, >+ or null if no stride has been identified. */ >+ tree stride; >+ >+ /* Enumerates the likelihood that EXPR indexes the inner dimension >+ of an array. */ >+ enum inner_likelihood inner_likelihood; >+ >+ /* True if STRIDE == 1 is a versioning opportunity when considered >+ in isolation. */ >+ bool versioning_opportunity_p; >+}; >+ >+/* Information about an address calculation, and the range of constant >+ offsets applied to it. */ >+struct address_info >+{ >+ static const unsigned int MAX_TERMS = 8; >+ >+ /* One statement that calculates the address. If multiple >statements >+ share the same address, we only record the first. */ >+ gimple *stmt; >+ >+ /* The loop containing STMT (cached for convenience). If multiple >+ statements share the same address, they all belong to this loop. >*/ >+ struct loop *loop; >+ >+ /* A decomposition of the calculation into a sum of terms plus an >+ optional base. When BASE is provided, it is never an SSA name. >+ Once initialization is complete, all members of TERMs are SSA >names. */ >+ tree base; >+ auto_vec<address_term_info, MAX_TERMS> terms; >+ >+ /* All bytes accessed from the address fall in the offset range >+ [MIN_OFFSET, MAX_OFFSET). */ >+ HOST_WIDE_INT min_offset, max_offset; >+}; >+ >+/* Stores addresses based on their base and terms (ignoring the >offsets). */ >+struct address_info_hasher : nofree_ptr_hash <address_info> >+{ >+ static hashval_t hash (const address_info *); >+ static bool equal (const address_info *, const address_info *); >+}; >+ >+/* Information about the versioning we'd like to apply to a loop. */ >+struct loop_info >+{ >+ bool worth_versioning_p () const; >+ >+ /* True if we've decided not to version this loop. The remaining >+ fields are meaningless if so. */ >+ bool rejected_p; >+ >+ /* True if at least one subloop of this loop benefits from >versioning. */ >+ bool subloops_benefit_p; >+ >+ /* An estimate of the total number of instructions in the loop, >+ excluding those in subloops that benefit from versioning. */ >+ unsigned int num_insns; >+ >+ /* The outermost loop that can handle all the version checks >+ described below. */ >+ struct loop *outermost; >+ >+ /* The first entry in the list of blocks that belong to this loop >+ (and not to subloops). m_next_block_in_loop provides the chain >+ pointers for the list. */ >+ basic_block block_list; >+ >+ /* We'd like to version the loop for the case in which these SSA >names >+ (keyed off their SSA_NAME_VERSION) are all equal to 1 at runtime. > */ >+ bitmap_head unity_names; >+ >+ /* If versioning succeeds, this points the version of the loop that >+ assumes the version conditions holds. */ >+ struct loop *optimized_loop; >+}; >+ >+/* The main pass structure. */ >+class loop_versioning >+{ >+public: >+ loop_versioning (function *); >+ ~loop_versioning (); >+ unsigned int run (); >+ >+private: >+ /* Used to walk the dominator tree to find loop versioning >conditions >+ that are always false. */ >+ class lv_dom_walker : public dom_walker >+ { >+ public: >+ lv_dom_walker (loop_versioning &); >+ >+ edge before_dom_children (basic_block) FINAL OVERRIDE; >+ void after_dom_children (basic_block) FINAL OVERRIDE; >+ >+ private: >+ /* The parent pass. */ >+ loop_versioning &m_lv; >+ >+ /* Used to build context-dependent range information. */ >+ evrp_range_analyzer m_range_analyzer; >+ }; >+ >+ /* Used to simplify statements based on conditions that are >established >+ by the version checks. */ >+ class name_prop : public substitute_and_fold_engine >+ { >+ public: >+ name_prop (loop_info &li) : m_li (li) {} >+ tree get_value (tree) FINAL OVERRIDE; >+ >+ private: >+ /* Information about the versioning we've performed on the loop. >*/ >+ loop_info &m_li; >+ }; >+ >+ loop_info &get_loop_info (struct loop *loop) { return >m_loops[loop->num]; } >+ >+ unsigned int max_insns_for_loop (struct loop *); >+ bool expensive_stmt_p (gimple *); >+ >+ void version_for_unity (gimple *, tree); >+ bool acceptable_multiplier_p (tree, unsigned HOST_WIDE_INT, >+ unsigned HOST_WIDE_INT * = 0); >+ bool acceptable_type_p (tree, unsigned HOST_WIDE_INT *); >+ bool multiply_term_by (address_term_info &, tree); >+ inner_likelihood get_inner_likelihood (tree, unsigned >HOST_WIDE_INT); >+ void analyze_stride (address_info &, address_term_info &, >+ tree, struct loop *); >+ bool find_per_loop_multiplication (address_info &, address_term_info >&); >+ void analyze_term_using_scevs (address_info &, address_term_info &); >+ void analyze_address_fragment (address_info &); >+ void record_address_fragment (gimple *, unsigned HOST_WIDE_INT, >+ tree, unsigned HOST_WIDE_INT, HOST_WIDE_INT); >+ void analyze_expr (gimple *, tree); >+ bool analyze_block (basic_block); >+ bool analyze_blocks (); >+ >+ void prune_loop_conditions (struct loop *, vr_values *); >+ bool prune_conditions (); >+ >+ void merge_loop_info (struct loop *, struct loop *); >+ void add_loop_to_queue (struct loop *); >+ bool decide_whether_loop_is_versionable (struct loop *); >+ bool make_versioning_decisions (); >+ >+ bool version_loop (struct loop *); >+ void implement_versioning_decisions (); >+ >+ /* The function we're optimizing. */ >+ function *m_fn; >+ >+ /* The obstack to use for all pass-specific bitmaps. */ >+ bitmap_obstack m_bitmap_obstack; >+ >+ /* An obstack to use for general allocation. */ >+ obstack m_obstack; >+ >+ /* The number of loops in the function. */ >+ unsigned int m_nloops; >+ >+ /* The total number of loop version conditions we've found. */ >+ unsigned int m_num_conditions; >+ >+ /* Assume that an address fragment of the form i * stride * scale >+ (for variable stride and constant scale) will not benefit from >+ versioning for stride == 1 when scale is greater than this value. > */ >+ unsigned HOST_WIDE_INT m_maximum_scale; >+ >+ /* Information about each loop. */ >+ auto_vec<loop_info> m_loops; >+ >+ /* Used to form a linked list of blocks that belong to a loop, >+ started by loop_info::block_list. */ >+ auto_vec<basic_block> m_next_block_in_loop; >+ >+ /* The list of loops that we've decided to version. */ >+ auto_vec<struct loop *> m_loops_to_version; >+ >+ /* A table of addresses in the current loop, keyed off their values >+ but not their offsets. */ >+ hash_table <address_info_hasher> m_address_table; >+ >+ /* A list of all addresses in M_ADDRESS_TABLE, in a predictable >order. */ >+ auto_vec <address_info *, 32> m_address_list; >+}; >+ >+/* If EXPR is an SSA name and not a default definition, return the >+ defining statement, otherwise return null. */ >+ >+static gimple * >+maybe_get_stmt (tree expr) >+{ >+ if (TREE_CODE (expr) == SSA_NAME && !SSA_NAME_IS_DEFAULT_DEF (expr)) >+ return SSA_NAME_DEF_STMT (expr); >+ return NULL; >+} >+ >+/* Like maybe_get_stmt, but also return null if the defining >+ statement isn't an assignment. */ >+ >+static gassign * >+maybe_get_assign (tree expr) >+{ >+ return safe_dyn_cast <gassign *> (maybe_get_stmt (expr)); >+} >+ >+/* Return true if this pass should look through a cast of expression >FROM >+ to type TYPE when analyzing pieces of an address. */ >+ >+static bool >+look_through_cast_p (tree type, tree from) >+{ >+ return (INTEGRAL_TYPE_P (TREE_TYPE (from)) == INTEGRAL_TYPE_P (type) >+ && POINTER_TYPE_P (TREE_TYPE (from)) == POINTER_TYPE_P (type)); >+} >+ >+/* Strip all conversions of integers or pointers from EXPR, regardless >+ of whether the conversions are nops. This is useful in the context >+ of this pass because we're not trying to fold or simulate the >+ expression; we just want to see how it's structured. */ >+ >+static tree >+strip_casts (tree expr) >+{ >+ const unsigned int MAX_NITERS = 4; >+ >+ tree type = TREE_TYPE (expr); >+ while (CONVERT_EXPR_P (expr) >+ && look_through_cast_p (type, TREE_OPERAND (expr, 0))) >+ expr = TREE_OPERAND (expr, 0); >+ >+ for (unsigned int niters = 0; niters < MAX_NITERS; ++niters) >+ { >+ gassign *assign = maybe_get_assign (expr); >+ if (assign >+ && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign)) >+ && look_through_cast_p (type, gimple_assign_rhs1 (assign))) >+ expr = gimple_assign_rhs1 (assign); >+ else >+ break; >+ } >+ return expr; >+} >+ >+/* Compare two address_term_infos in the same address_info. */ >+ >+static int >+compare_address_terms (const void *a_uncast, const void *b_uncast) >+{ >+ const address_term_info *a = (const address_term_info *) a_uncast; >+ const address_term_info *b = (const address_term_info *) b_uncast; >+ >+ if (a->expr != b->expr) >+ return SSA_NAME_VERSION (a->expr) < SSA_NAME_VERSION (b->expr) ? >-1 : 1; >+ >+ if (a->multiplier != b->multiplier) >+ return a->multiplier < b->multiplier ? -1 : 1; >+ >+ return 0; >+} >+ >+/* Dump ADDRESS using flags FLAGS. */ >+ >+static void >+dump_address_info (dump_flags_t flags, address_info &address) >+{ >+ if (address.base) >+ dump_printf (flags, "%T + ", address.base); >+ for (unsigned int i = 0; i < address.terms.length (); ++i) >+ { >+ if (i != 0) >+ dump_printf (flags, " + "); >+ dump_printf (flags, "%T", address.terms[i].expr); >+ if (address.terms[i].multiplier != 1) >+ dump_printf (flags, " * %wd", address.terms[i].multiplier); >+ } >+ dump_printf (flags, " + [%wd, %wd]", >+ address.min_offset, address.max_offset - 1); >+} >+ >+/* Hash an address_info based on its base and terms. */ >+ >+hashval_t >+address_info_hasher::hash (const address_info *info) >+{ >+ inchash::hash hash; >+ hash.add_int (info->base ? TREE_CODE (info->base) : 0); >+ hash.add_int (info->terms.length ()); >+ for (unsigned int i = 0; i < info->terms.length (); ++i) >+ { >+ hash.add_int (SSA_NAME_VERSION (info->terms[i].expr)); >+ hash.add_hwi (info->terms[i].multiplier); >+ } >+ return hash.end (); >+} >+ >+/* Return true if two address_infos have equal bases and terms. Other >+ properties might be different (such as the statement or constant >+ offset range). */ >+ >+bool >+address_info_hasher::equal (const address_info *a, const address_info >*b) >+{ >+ if (a->base != b->base >+ && (!a->base || !b->base || !operand_equal_p (a->base, b->base, >0))) >+ return false; >+ >+ if (a->terms.length () != b->terms.length ()) >+ return false; >+ >+ for (unsigned int i = 0; i < a->terms.length (); ++i) >+ if (a->terms[i].expr != b->terms[i].expr >+ || a->terms[i].multiplier != b->terms[i].multiplier) >+ return false; >+ >+ return true; >+} >+ >+/* Return true if we want to version the loop, i.e. if we have a >+ specific reason for doing so and no specific reason not to. */ >+ >+bool >+loop_info::worth_versioning_p () const >+{ >+ return (!rejected_p >+ && (!bitmap_empty_p (&unity_names) || subloops_benefit_p)); >+} >+ >+loop_versioning::lv_dom_walker::lv_dom_walker (loop_versioning &lv) >+ : dom_walker (CDI_DOMINATORS), m_lv (lv), m_range_analyzer (false) >+{ >+} >+ >+/* Process BB before processing the blocks it dominates. */ >+ >+edge >+loop_versioning::lv_dom_walker::before_dom