On 09/03/2018 04:43 PM, Richard Biener wrote: > On Mon, 3 Sep 2018, Martin Liška wrote: > >> On 09/03/2018 04:00 PM, Richard Biener wrote: >>> On Mon, 3 Sep 2018, Martin Liška wrote: >>> >>>> On 09/03/2018 02:54 PM, Martin Liška wrote: >>>>> On 09/03/2018 02:41 PM, Richard Biener wrote: >>>>>> On Mon, 3 Sep 2018, Martin Liška wrote: >>>>>> >>>>>>> On 04/25/2018 01:42 PM, Richard Biener wrote: >>>>>>>> >>>>>>>> The following patch^Whack splits $subject files into three, one >>>>>>>> for the predicates (due to an implementation detail) and two for >>>>>>>> the rest - for now into similar LOC size files. >>>>>>>> >>>>>>>> I'd like to get help on the makefile changes to make them less >>>>>>>> verbose, somehow globbing the -[12p] parts. >>>>>>>> >>>>>>>> Also you can see the split point is manually chosen which means >>>>>>>> it will bitrot. Timings for the stage2 compiles on a x86_64 >>>>>>>> box are >>>>>>>> >>>>>>>> gimple-match-p.c 5s >>>>>>>> generic-match-p.c 3s >>>>>>>> gimple-match-1.c 85s >>>>>>>> generic-match-1.c 56s >>>>>>>> gimple-match-2.c 82s >>>>>>>> generic-match-2.c 31s >>>>>>>> >>>>>>>> the required header files are quite big (and of course everything >>>>>>>> needs to be exported without the analysis work becoming too >>>>>>>> cumbersome), >>>>>>>> it's 3342 LOC for gimple-match-head.h and 1556 LOC for >>>>>>>> generic-match-head.h >>>>>>>> >>>>>>>> The machine I tested is quite fast so the 80ish second timings are >>>>>>>> still >>>>>>>> too slow I guess and thus splitting up into four files for gimple and >>>>>>>> three files for generic looks better. >>>>>>>> >>>>>>>> Note we lose some inlining/cloning capability in the splitting process >>>>>>>> (I see quite a bit of constprop/isra work being done on the generated >>>>>>>> files). I didn't try to measure the runtime impact though. >>>>>>>> >>>>>>>> The patch still needs quite some TLC, it really is a bit hacky but I'd >>>>>>>> like to get feedback on the approach and I didn't want to spend time >>>>>>>> on programatically finding optimal split points (so everything is >>>>>>>> output >>>>>>>> in the same semi-random order as before). >>>>>>>> >>>>>>>> Richard. > ... >>>>>>> I took a look at gimple-match.c and what about doing a split in >>>>>>> following way: >>>>>>> - all gimple_simplify_$number going into a separate header file (~12000 >>>>>>> LOC) >>>>>>> - all the function can be marked as static inline >>>>>>> - all other gimple_simplify_$code can be split into arbitrary number of >>>>>>> parts >>>>>>> - we have 287 such functions where each function only calls >>>>>>> gimple_simplify_$number and >>>>>>> on average there 10 of such calls >>>>>>> - that would allow to remove most of gimple_simplify_$number functions >>>>>>> from the header file >>>>>>> >>>>>>> Richi do you think it will be viable? >>>>>> >>>>>> That relies on the cgraph code DCEing all unused gimple_simplify_$number >>>>>> functions from the header fast as they are now effectively duplicated >>>>>> into all parts, correct? Also I'm not sure if we actually want to inline >>>>>> them... they are split out to get both code size and compile-time >>>>>> under control. Unfortunately we have still high max-inline-insns-single >>>>>> which is used for inline marked functions. >>>>>> >>>>>> Eventually doing a "proper" partitioning algorithm is viable, that is, >>>>>> partition based on gimple_simplify_$code and put gimple_simplify_$number >>>>>> where they are used. If they are used across different codes then >>>>>> merge those partitions. I guess you'll see that that'll merge the >>>>>> biggest _$code parititions :/ (MINUS_EXPR, PLUS_EXPR). >>>>> >>>>> Yes, that should be much better. I'm attaching a 'callgraph' that was >>>>> done by grepping. >>>>> Function starting at the beginning of a line is function definition, with >>>>> an indentation >>>>> one can see calls. >>>>> >>>>> Yes, PLUS and MINUS call ~20 gimple_simplify_$number calls. >>>>> >>>>> Well, generating some simple call graph format for the source file and a >>>>> source file >>>>> annotation of nodes can be input for a partitioning tool that can do the >>>>> split. >>>>> >>>>> Issue with the generated files is that one needs to fix the most >>>>> offenders (*-match.c, insn-recog.c, insn-emit.c, ..) >>>>> in order to see some improvement. >>>>> >>>>> Looking at insn-recog.c, maybe similar callgraph-based split can be done >>>>> for recog_$number functions? >>>>> >>>>> Martin >>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>> >>>> >>>> I'm sending SCC components for gimple-match.c. So there are 3 quite big >>>> one and rest is small. It's questionable >>>> whether partitioning based on that will provide desired speed up? >>> >>> When I experimented split based on # of functions wasn't working well, >>> only split based on # of lines did. I'd still expect that eventually >>> basing the split on the SCC components makes sense if you use say, >>> the biggest 4 (but measure size in # lines) and merge the rest evenly. >> >> I see! Note that shrinking gimple-match.o 4 times will be probably >> sufficient for general >> speed up: >> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440 >> >>> >>> It would be nice if that all would be scriptable instead of coding it >>> into genmatch.c but that's of course possible as well - just add >>> some extra "passes" over code-gen as I did in the hac^Wpatch. You >> >> That would be my plan, genmatch can mark in C comments function that can be >> partitioned >> and callgraph of these functions. >> >>> could use graphds.c routines to compute SCCs for example. Knowing >>> # lines beforehand is a bit hard though - code-generating into >>> a set of character buffers might be possible but I wired everything >>> to use FILE ... (and no stringstreams in the C library). >>> And no, please do not convert to C++ streams ;)) >> >> ... and a C++ splitter can do the rest: read content, do SCC, split to N >> parts >> and stream out. >> >> I can work on that. Questionable is still Makefile integration of such >> parallelism? > > Well, just hard-code the number of pieces and thus the pices to > compile in the end... > > Of course we shouldn't add any additional build dependencies > (a "C++ splitter"). Adding additional annotation to the genmatch > generated sources may be OK but then eventually doing everything in > genmatch isn't too complicated (putting aside that # of lines metric...). > > Richard. >
Hi. There's working solution for {gimple,generic}-match.c. It introduces splitter.cc which parses annotated generated files and split it according to call graph provided by annotations. Currently I copied a SCC implementation as graphds.c has quite some dependencies (xrealloc, bitmap,..). Questionable how to reduce the dependencies? For gimple-match.c the biggest SCC component contains about 1/4 LOC, which means maximal reasonable number of parts is equal to 4. Doing that the biggest component is compiled with -O2 14.5s, which the original gimple-match.c 50.0s. That sounds promising: l gimple-match-part*.c -rw-r--r-- 1 marxin users 791442 Sep 4 17:02 gimple-match-part-0.c -rw-r--r-- 1 marxin users 748610 Sep 4 17:02 gimple-match-part-1.c -rw-r--r-- 1 marxin users 783749 Sep 4 17:02 gimple-match-part-2.c -rw-r--r-- 1 marxin users 828990 Sep 4 17:02 gimple-match-part-3.c -rw-r--r-- 1 marxin users 103130 Sep 4 17:02 gimple-match-part-footer.c for generic-match.c: l generic-match-part*.c -rw-r--r-- 1 marxin users 528712 Sep 4 17:02 generic-match-part-0.c -rw-r--r-- 1 marxin users 517120 Sep 4 17:02 generic-match-part-1.c -rw-r--r-- 1 marxin users 519777 Sep 4 17:02 generic-match-part-2.c -rw-r--r-- 1 marxin users 221971 Sep 4 17:02 generic-match-part-3.c -rw-r--r-- 1 marxin users 14596 Sep 4 17:02 generic-match-part-footer.c There are some issues with functions that live in gimple-match-head.c, these should live in a header file the parts can include. I also renamed gimple_simplify function that live in gimple-match.c to gimple_simplify_generated as they clash with these that are in gimple-match-head.c. There are still various questions: - how to make the split process optional, how to modify then Makefile? - SCC implementation should be shared with graphds.c - splitter.cc should be cleaned up - in order to achieve real speed up we need to split also other generated (and also dwarf2out.c, i386.c, ..) files: here I'm most concerned about insn-recog.c, which can't be split the same way without ending up with a single huge SCC component. Ideas what to do with that file? Thoughts? Martin
>From 0f57bea9576867140846abb8703dba878eb20c70 Mon Sep 17 00:00:00 2001 From: marxin <mli...@suse.cz> Date: Tue, 4 Sep 2018 14:54:17 +0200 Subject: [PATCH] Come up with splitter for gimple and generic match files. --- gcc/Makefile.in | 49 ++++- gcc/genmatch.c | 57 ++++-- gcc/gimple-match-head-common.h | 191 ++++++++++++++++++ gcc/gimple-match-head.c | 211 +++---------------- gcc/splitter.cc | 357 +++++++++++++++++++++++++++++++++ 5 files changed, 647 insertions(+), 218 deletions(-) create mode 100644 gcc/gimple-match-head-common.h create mode 100644 gcc/splitter.cc diff --git a/gcc/Makefile.in b/gcc/Makefile.in index e008f63b2ea..3cae9bcfd8d 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -218,8 +218,16 @@ gengtype-lex.o-warn = -Wno-error libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn = -Wno-error libgcov-merge-tool.o-warn = -Wno-error -gimple-match.o-warn = -Wno-unused -generic-match.o-warn = -Wno-unused +gimple-match-part-0.o-warn = -Wno-unused +gimple-match-part-1.o-warn = -Wno-unused +gimple-match-part-2.o-warn = -Wno-unused +gimple-match-part-3.o-warn = -Wno-unused +gimple-match-part-footer.o-warn = -Wno-unused +generic-match-part-0.o-warn = -Wno-unused +generic-match-part-1.o-warn = -Wno-unused +generic-match-part-2.o-warn = -Wno-unused +generic-match-part-3.o-warn = -Wno-unused +generic-match-part-footer.o-warn = -Wno-unused dfp.o-warn = -Wno-strict-aliasing # All warnings have to be shut off in stage1 if the compiler used then @@ -522,7 +530,7 @@ CROSS_SYSTEM_HEADER_DIR = @CROSS_SYSTEM_HEADER_DIR@ # to directories that might not exist yet. # The sed idiom for this is to repeat the search-and-replace until it doesn't match, using :a ... ta. # Use single quotes here to avoid nested double- and backquotes, this -# macro is also used in a double-quoted context. +# macro is also used inmem_alloc_description a double-quoted context. SYSTEM_HEADER_DIR = `echo @SYSTEM_HEADER_DIR@ | sed -e :a -e 's,[^/]*/\.\.\/,,' -e ta` # Path to the system headers on the build machine. @@ -1209,8 +1217,17 @@ C_COMMON_OBJS = c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o \ # will build them sooner, because they are large and otherwise tend to be # the last objects to finish building. OBJS = \ - gimple-match.o \ - generic-match.o \ + gimple-match-part-0.o \ + gimple-match-part-1.o \ + gimple-match-part-2.o \ + gimple-match-part-3.o \ + gimple-match-part-footer.o \ + gimple-match-head.o \ + generic-match-part-0.o \ + generic-match-part-1.o \ + generic-match-part-2.o \ + generic-match-part-3.o \ + generic-match-part-footer.o \ insn-attrtab.o \ insn-automata.o \ insn-dfatab.o \ @@ -2507,10 +2524,19 @@ s-tm-texi: build/genhooks$(build_exeext) $(srcdir)/doc/tm.texi.in false; \ fi -gimple-match.c: s-match gimple-match-head.c ; @true -generic-match.c: s-match generic-match-head.c ; @true +gimple-match-part-0.c: s-match gimple-match-head.c ; @true +gimple-match-part-1.c: s-match gimple-match-head.c ; @true +gimple-match-part-2.c: s-match gimple-match-head.c ; @true +gimple-match-part-3.c: s-match gimple-match-head.c ; @true +gimple-match-part-footer.c: s-match gimple-match-head.c ; @true -s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd +generic-match-part-0.c: s-match generic-match-head.c ; @true +generic-match-part-1.c: s-match generic-match-head.c ; @true +generic-match-part-2.c: s-match generic-match-head.c ; @true +generic-match-part-3.c: s-match generic-match-head.c ; @true +generic-match-part-footer.c: s-match generic-match-head.c ; @true + +s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd build/splitter$(build_exeext) $(RUN_GEN) build/genmatch$(build_exeext) --gimple $(srcdir)/match.pd \ > tmp-gimple-match.c $(RUN_GEN) build/genmatch$(build_exeext) --generic $(srcdir)/match.pd \ @@ -2519,6 +2545,8 @@ s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd gimple-match.c $(SHELL) $(srcdir)/../move-if-change tmp-generic-match.c \ generic-match.c + build/splitter$(build_exeext) generic + build/splitter$(build_exeext) gimple $(STAMP) s-match GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ @@ -2793,6 +2821,7 @@ build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ internal-fn.def +build/splitter.o: splitter.cc # Compile the programs that generate insn-* from the machine description. # They are compiled with $(COMPILER_FOR_BUILD), and associated libraries, @@ -2836,6 +2865,10 @@ endif build/genmatch$(build_exeext) : $(BUILD_CPPLIB) \ $(BUILD_ERRORS) build/vec.o build/hash-table.o build/sort.o +build/splitter$(build_exeext) : $(BUILD_CPPLIB) \ + $(BUILD_ERRORS) splitter.o + g++ -o $@ $^ + # These programs are not linked with the MD reader. build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \ build/gengtype-state.o build/version.o build/errors.o diff --git a/gcc/genmatch.c b/gcc/genmatch.c index 5f1691ae206..3a06210d0c2 100644 --- a/gcc/genmatch.c +++ b/gcc/genmatch.c @@ -3551,6 +3551,7 @@ dt_simplify::gen (FILE *f, int indent, bool gimple) /* If we have a split-out function for the actual transform, call it. */ if (info && info->fname) { + fprintf (f, "// call-fn:%s\n", info->fname); if (gimple) { fprintf_indent (f, indent, "if (%s (res_op, seq, " @@ -3682,6 +3683,27 @@ sinfo_hashmap_traits::equal_keys (const key_type &v, return compare_op (v->s->result, v->s, candidate->s->result, candidate->s); } +static void +print_simplify_fnheader (FILE *f, bool gimple, const char *op, unsigned n, bool declaration) +{ + if (gimple) + fprintf (f, "%sbool\n" + "gimple_simplify_%s (gimple_match_op *res_op," + " gimple_seq *seq,\n" + " tree (*valueize)(tree) " + "ATTRIBUTE_UNUSED,\n" + " code_helper ARG_UNUSED (code), tree " + "ARG_UNUSED (type)", declaration ? "extern " : "", + op); + else + fprintf (f, "extern tree\n" + "generic_simplify_%s (location_t ARG_UNUSED (loc), enum " + "tree_code ARG_UNUSED (code), const tree ARG_UNUSED (type)", + op); + for (unsigned i = 0; i < n; ++i) + fprintf (f, ", tree op%d", i); + fprintf (f, ")%s\n", declaration ? ";" : ""); +} /* Main entry to generate code for matching GIMPLE IL off the decision tree. */ @@ -3719,8 +3741,10 @@ decision_tree::gen (FILE *f, bool gimple) /* Generate a split out function with the leaf transform code. */ s->fname = xasprintf ("%s_simplify_%u", gimple ? "gimple" : "generic", fcnt++); + + fprintf (f, "\n// split-fn-begin:%s\n", s->fname); if (gimple) - fprintf (f, "\nstatic bool\n" + fprintf (f, "static bool\n" "%s (gimple_match_op *res_op, gimple_seq *seq,\n" " tree (*valueize)(tree) ATTRIBUTE_UNUSED,\n" " const tree ARG_UNUSED (type), tree *ARG_UNUSED " @@ -3755,6 +3779,7 @@ decision_tree::gen (FILE *f, bool gimple) else fprintf (f, " return NULL_TREE;\n"); fprintf (f, "}\n"); + fprintf (f, "// split-fn-end\n"); } fprintf (stderr, "removed %u duplicate tails\n", rcnt); @@ -3774,23 +3799,10 @@ decision_tree::gen (FILE *f, bool gimple) && e->operation->kind != id_base::CODE)) continue; - if (gimple) - fprintf (f, "\nstatic bool\n" - "gimple_simplify_%s (gimple_match_op *res_op," - " gimple_seq *seq,\n" - " tree (*valueize)(tree) " - "ATTRIBUTE_UNUSED,\n" - " code_helper ARG_UNUSED (code), tree " - "ARG_UNUSED (type)\n", - e->operation->id); - else - fprintf (f, "\nstatic tree\n" - "generic_simplify_%s (location_t ARG_UNUSED (loc), enum " - "tree_code ARG_UNUSED (code), const tree ARG_UNUSED (type)", - e->operation->id); - for (unsigned i = 0; i < n; ++i) - fprintf (f, ", tree op%d", i); - fprintf (f, ")\n"); + print_simplify_fnheader (f, gimple, e->operation->id, n, true); + fprintf (f, "\n// split-fn-begin:%s_%s\n", gimple ? "gimple" : "generic", + e->operation->id); + print_simplify_fnheader (f, gimple, e->operation->id, n, false); fprintf (f, "{\n"); dop->gen_kids (f, 2, gimple); if (gimple) @@ -3798,13 +3810,14 @@ decision_tree::gen (FILE *f, bool gimple) else fprintf (f, " return NULL_TREE;\n"); fprintf (f, "}\n"); + fprintf (f, "// split-fn-end\n"); } /* Then generate the main entry with the outermost switch and tail-calls to the split-out functions. */ if (gimple) - fprintf (f, "\nstatic bool\n" - "gimple_simplify (gimple_match_op *res_op, gimple_seq *seq,\n" + fprintf (f, "\nbool\n" + "gimple_simplify_generated (gimple_match_op *res_op, gimple_seq *seq,\n" " tree (*valueize)(tree) ATTRIBUTE_UNUSED,\n" " code_helper code, const tree type"); else @@ -3868,7 +3881,7 @@ decision_tree::gen (FILE *f, bool gimple) void write_predicate (FILE *f, predicate_id *p, decision_tree &dt, bool gimple) { - fprintf (f, "\nbool\n" + fprintf (f, "\nstatic bool\n" "%s%s (tree t%s%s)\n" "{\n", gimple ? "gimple_" : "tree_", p->id, p->nargs > 0 ? ", tree *res_ops" : "", @@ -5117,7 +5130,7 @@ add_operator (VIEW_CONVERT2, "view_convert2", "tcc_unary", 1); parser p (r); if (gimple) - write_header (stdout, "gimple-match-head.c"); + write_header (stdout, "gimple-match-head-common.h"); else write_header (stdout, "generic-match-head.c"); diff --git a/gcc/gimple-match-head-common.h b/gcc/gimple-match-head-common.h new file mode 100644 index 00000000000..a854db32e7d --- /dev/null +++ b/gcc/gimple-match-head-common.h @@ -0,0 +1,191 @@ +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include "tree.h" +#include "gimple.h" +#include "ssa.h" +#include "cgraph.h" +#include "fold-const.h" +#include "fold-const-call.h" +#include "stor-layout.h" +#include "gimple-fold.h" +#include "calls.h" +#include "tree-dfa.h" +#include "builtins.h" +#include "gimple-match.h" +#include "tree-pass.h" +#include "internal-fn.h" +#include "case-cfn-macros.h" +#include "gimplify.h" +#include "optabs-tree.h" +#include "tree-eh.h" + +/* Return whether T is a constant that we'll dispatch to fold to + evaluate fully constant expressions. */ + +static inline bool +constant_for_folding (tree t) +{ + return (CONSTANT_CLASS_P (t) + /* The following is only interesting to string builtins. */ + || (TREE_CODE (t) == ADDR_EXPR + && TREE_CODE (TREE_OPERAND (t, 0)) == STRING_CST)); +} + +/* Helper for gimple_simplify_generated valueizing OP using VALUEIZE and setting + VALUEIZED to true if valueization changed OP. */ + +static inline tree +do_valueize (tree op, tree (*valueize)(tree), bool &valueized) +{ + if (valueize && TREE_CODE (op) == SSA_NAME) + { + tree tem = valueize (op); + if (tem && tem != op) + { + op = tem; + valueized = true; + } + } + return op; +} + +/* Routine to determine if the types T1 and T2 are effectively + the same for GIMPLE. If T1 or T2 is not a type, the test + applies to their TREE_TYPE. */ + +static inline bool +types_match (tree t1, tree t2) +{ + if (!TYPE_P (t1)) + t1 = TREE_TYPE (t1); + if (!TYPE_P (t2)) + t2 = TREE_TYPE (t2); + + return types_compatible_p (t1, t2); +} + +/* Return if T has a single use. For GIMPLE, we also allow any + non-SSA_NAME (ie constants) and zero uses to cope with uses + that aren't linked up yet. */ + +static inline bool +single_use (tree t) +{ + return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t); +} + +/* Return true if math operations should be canonicalized, + e.g. sqrt(sqrt(x)) -> pow(x, 0.25). */ + +static inline bool +canonicalize_math_p () +{ + return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0; +} + +/* Return true if math operations that are beneficial only after + vectorization should be canonicalized. */ + +static inline bool +canonicalize_math_after_vectorization_p () +{ + return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0; +} + +/* Helper for the autogenerated code, get at the definition of NAME when + VALUEIZE allows that. */ + +static inline gimple * +get_def (tree (*valueize)(tree), tree name) +{ + if (valueize && ! valueize (name)) + return NULL; + return SSA_NAME_DEF_STMT (name); +} + +/* Helper for the autogenerated code, valueize OP. */ + +static inline tree +do_valueize (tree (*valueize)(tree), tree op) +{ + if (valueize && TREE_CODE (op) == SSA_NAME) + { + tree tem = valueize (op); + if (tem) + return tem; + } + return op; +} + + +/* Return true if pow(cst, x) should be optimized into exp(log(cst) * x). + As a workaround for SPEC CPU2017 628.pop2_s, don't do it if arg0 + is an exact integer, arg1 = phi_res +/- cst1 and phi_res = PHI <cst2, ...> + where cst2 +/- cst1 is an exact integer, because then pow (arg0, arg1) + will likely be exact, while exp (log (arg0) * arg1) might be not. + Also don't do it if arg1 is phi_res above and cst2 is an exact integer. */ + +static bool +optimize_pow_to_exp (tree arg0, tree arg1) +{ + gcc_assert (TREE_CODE (arg0) == REAL_CST); + if (!real_isinteger (TREE_REAL_CST_PTR (arg0), TYPE_MODE (TREE_TYPE (arg0)))) + return true; + + if (TREE_CODE (arg1) != SSA_NAME) + return true; + + gimple *def = SSA_NAME_DEF_STMT (arg1); + gphi *phi = dyn_cast <gphi *> (def); + tree cst1 = NULL_TREE; + enum tree_code code = ERROR_MARK; + if (!phi) + { + if (!is_gimple_assign (def)) + return true; + code = gimple_assign_rhs_code (def); + switch (code) + { + case PLUS_EXPR: + case MINUS_EXPR: + break; + default: + return true; + } + if (TREE_CODE (gimple_assign_rhs1 (def)) != SSA_NAME + || TREE_CODE (gimple_assign_rhs2 (def)) != REAL_CST) + return true; + + cst1 = gimple_assign_rhs2 (def); + + phi = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def))); + if (!phi) + return true; + } + + tree cst2 = NULL_TREE; + int n = gimple_phi_num_args (phi); + for (int i = 0; i < n; i++) + { + tree arg = PHI_ARG_DEF (phi, i); + if (TREE_CODE (arg) != REAL_CST) + continue; + else if (cst2 == NULL_TREE) + cst2 = arg; + else if (!operand_equal_p (cst2, arg, 0)) + return true; + } + + if (cst1 && cst2) + cst2 = const_binop (code, TREE_TYPE (cst2), cst2, cst1); + if (cst2 + && TREE_CODE (cst2) == REAL_CST + && real_isinteger (TREE_REAL_CST_PTR (cst2), + TYPE_MODE (TREE_TYPE (cst2)))) + return false; + return true; +} diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 414007ec1f9..f6cffedbc2f 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -41,35 +41,25 @@ along with GCC; see the file COPYING3. If not see #include "gimplify.h" #include "optabs-tree.h" #include "tree-eh.h" +#include "gimple-match-head-common.h" /* Forward declarations of the private auto-generated matchers. They expect valueized operands in canonical order and do not perform simplification of all-constant operands. */ -static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), +extern bool gimple_simplify_generated (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree); -static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), +extern bool gimple_simplify_generated (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree); -static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), +extern bool gimple_simplify_generated (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree); -static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), +extern bool gimple_simplify_generated (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree); -static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), +extern bool gimple_simplify_generated (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree, tree); -const unsigned int gimple_match_op::MAX_NUM_OPS; - -/* Return whether T is a constant that we'll dispatch to fold to - evaluate fully constant expressions. */ -static inline bool -constant_for_folding (tree t) -{ - return (CONSTANT_CLASS_P (t) - /* The following is only interesting to string builtins. */ - || (TREE_CODE (t) == ADDR_EXPR - && TREE_CODE (TREE_OPERAND (t, 0)) == STRING_CST)); -} +const unsigned int gimple_match_op::MAX_NUM_OPS; /* Try to convert conditional operation ORIG_OP into an IFN_COND_* operation. Return true on success, storing the new operation in NEW_OP. */ @@ -167,7 +157,7 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op, } /* Helper that matches and simplifies the toplevel result from - a gimple_simplify run (where we don't want to build + a gimple_simplify_generated run (where we don't want to build a stmt in case it's used in in-place folding). Replaces RES_OP with a simplified and/or canonicalized result and returns whether any change was made. */ @@ -211,7 +201,7 @@ gimple_resimplify1 (gimple_seq *seq, gimple_match_op *res_op, ++depth; gimple_match_op res_op2 (*res_op); - if (gimple_simplify (&res_op2, seq, valueize, + if (gimple_simplify_generated (&res_op2, seq, valueize, res_op->code, res_op->type, res_op->ops[0])) { --depth; @@ -227,7 +217,7 @@ gimple_resimplify1 (gimple_seq *seq, gimple_match_op *res_op, } /* Helper that matches and simplifies the toplevel result from - a gimple_simplify run (where we don't want to build + a gimple_simplify_generated run (where we don't want to build a stmt in case it's used in in-place folding). Replaces RES_OP with a simplified and/or canonicalized result and returns whether any change was made. */ @@ -282,7 +272,7 @@ gimple_resimplify2 (gimple_seq *seq, gimple_match_op *res_op, ++depth; gimple_match_op res_op2 (*res_op); - if (gimple_simplify (&res_op2, seq, valueize, + if (gimple_simplify_generated (&res_op2, seq, valueize, res_op->code, res_op->type, res_op->ops[0], res_op->ops[1])) { @@ -299,7 +289,7 @@ gimple_resimplify2 (gimple_seq *seq, gimple_match_op *res_op, } /* Helper that matches and simplifies the toplevel result from - a gimple_simplify run (where we don't want to build + a gimple_simplify_generated run (where we don't want to build a stmt in case it's used in in-place folding). Replaces RES_OP with a simplified and/or canonicalized result and returns whether any change was made. */ @@ -353,7 +343,7 @@ gimple_resimplify3 (gimple_seq *seq, gimple_match_op *res_op, ++depth; gimple_match_op res_op2 (*res_op); - if (gimple_simplify (&res_op2, seq, valueize, + if (gimple_simplify_generated (&res_op2, seq, valueize, res_op->code, res_op->type, res_op->ops[0], res_op->ops[1], res_op->ops[2])) { @@ -370,7 +360,7 @@ gimple_resimplify3 (gimple_seq *seq, gimple_match_op *res_op, } /* Helper that matches and simplifies the toplevel result from - a gimple_simplify run (where we don't want to build + a gimple_simplify_generated run (where we don't want to build a stmt in case it's used in in-place folding). Replaces RES_OP with a simplified and/or canonicalized result and returns whether any change was made. */ @@ -393,7 +383,7 @@ gimple_resimplify4 (gimple_seq *seq, gimple_match_op *res_op, ++depth; gimple_match_op res_op2 (*res_op); - if (gimple_simplify (&res_op2, seq, valueize, + if (gimple_simplify_generated (&res_op2, seq, valueize, res_op->code, res_op->type, res_op->ops[0], res_op->ops[1], res_op->ops[2], res_op->ops[3])) @@ -411,7 +401,7 @@ gimple_resimplify4 (gimple_seq *seq, gimple_match_op *res_op, } /* Helper that matches and simplifies the toplevel result from - a gimple_simplify run (where we don't want to build + a gimple_simplify_generated run (where we don't want to build a stmt in case it's used in in-place folding). Replaces RES_OP with a simplified and/or canonicalized result and returns whether any change was made. */ @@ -423,7 +413,7 @@ gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, /* No constant folding is defined for five-operand functions. */ gimple_match_op res_op2 (*res_op); - if (gimple_simplify (&res_op2, seq, valueize, + if (gimple_simplify_generated (&res_op2, seq, valueize, res_op->code, res_op->type, res_op->ops[0], res_op->ops[1], res_op->ops[2], res_op->ops[3], res_op->ops[4])) @@ -616,7 +606,7 @@ gimple_simplify (enum tree_code code, tree type, } gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, code, type, op0)) + if (!gimple_simplify_generated (&res_op, seq, valueize, code, type, op0)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } @@ -648,7 +638,7 @@ gimple_simplify (enum tree_code code, tree type, } gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, code, type, op0, op1)) + if (!gimple_simplify_generated (&res_op, seq, valueize, code, type, op0, op1)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } @@ -676,7 +666,7 @@ gimple_simplify (enum tree_code code, tree type, std::swap (op0, op1); gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, code, type, op0, op1, op2)) + if (!gimple_simplify_generated (&res_op, seq, valueize, code, type, op0, op1, op2)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } @@ -696,7 +686,7 @@ gimple_simplify (combined_fn fn, tree type, } gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, fn, type, arg0)) + if (!gimple_simplify_generated (&res_op, seq, valueize, fn, type, arg0)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } @@ -717,7 +707,7 @@ gimple_simplify (combined_fn fn, tree type, } gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, fn, type, arg0, arg1)) + if (!gimple_simplify_generated (&res_op, seq, valueize, fn, type, arg0, arg1)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } @@ -739,29 +729,11 @@ gimple_simplify (combined_fn fn, tree type, } gimple_match_op res_op; - if (!gimple_simplify (&res_op, seq, valueize, fn, type, arg0, arg1, arg2)) + if (!gimple_simplify_generated (&res_op, seq, valueize, fn, type, arg0, arg1, arg2)) return NULL_TREE; return maybe_push_res_to_seq (&res_op, seq); } -/* Helper for gimple_simplify valueizing OP using VALUEIZE and setting - VALUEIZED to true if valueization changed OP. */ - -static inline tree -do_valueize (tree op, tree (*valueize)(tree), bool &valueized) -{ - if (valueize && TREE_CODE (op) == SSA_NAME) - { - tree tem = valueize (op); - if (tem && tem != op) - { - op = tem; - valueized = true; - } - } - return op; -} - /* If RES_OP is a call to a conditional internal function, try simplifying the associated unconditional operation and using the result to build a new conditional operation. For example, if RES_OP is: @@ -1019,140 +991,3 @@ gimple_simplify (gimple *stmt, gimple_match_op *res_op, gimple_seq *seq, return false; } - - -/* Helper for the autogenerated code, valueize OP. */ - -inline tree -do_valueize (tree (*valueize)(tree), tree op) -{ - if (valueize && TREE_CODE (op) == SSA_NAME) - { - tree tem = valueize (op); - if (tem) - return tem; - } - return op; -} - -/* Helper for the autogenerated code, get at the definition of NAME when - VALUEIZE allows that. */ - -inline gimple * -get_def (tree (*valueize)(tree), tree name) -{ - if (valueize && ! valueize (name)) - return NULL; - return SSA_NAME_DEF_STMT (name); -} - -/* Routine to determine if the types T1 and T2 are effectively - the same for GIMPLE. If T1 or T2 is not a type, the test - applies to their TREE_TYPE. */ - -static inline bool -types_match (tree t1, tree t2) -{ - if (!TYPE_P (t1)) - t1 = TREE_TYPE (t1); - if (!TYPE_P (t2)) - t2 = TREE_TYPE (t2); - - return types_compatible_p (t1, t2); -} - -/* Return if T has a single use. For GIMPLE, we also allow any - non-SSA_NAME (ie constants) and zero uses to cope with uses - that aren't linked up yet. */ - -static inline bool -single_use (tree t) -{ - return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use (t); -} - -/* Return true if math operations should be canonicalized, - e.g. sqrt(sqrt(x)) -> pow(x, 0.25). */ - -static inline bool -canonicalize_math_p () -{ - return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0; -} - -/* Return true if math operations that are beneficial only after - vectorization should be canonicalized. */ - -static inline bool -canonicalize_math_after_vectorization_p () -{ - return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0; -} - -/* Return true if pow(cst, x) should be optimized into exp(log(cst) * x). - As a workaround for SPEC CPU2017 628.pop2_s, don't do it if arg0 - is an exact integer, arg1 = phi_res +/- cst1 and phi_res = PHI <cst2, ...> - where cst2 +/- cst1 is an exact integer, because then pow (arg0, arg1) - will likely be exact, while exp (log (arg0) * arg1) might be not. - Also don't do it if arg1 is phi_res above and cst2 is an exact integer. */ - -static bool -optimize_pow_to_exp (tree arg0, tree arg1) -{ - gcc_assert (TREE_CODE (arg0) == REAL_CST); - if (!real_isinteger (TREE_REAL_CST_PTR (arg0), TYPE_MODE (TREE_TYPE (arg0)))) - return true; - - if (TREE_CODE (arg1) != SSA_NAME) - return true; - - gimple *def = SSA_NAME_DEF_STMT (arg1); - gphi *phi = dyn_cast <gphi *> (def); - tree cst1 = NULL_TREE; - enum tree_code code = ERROR_MARK; - if (!phi) - { - if (!is_gimple_assign (def)) - return true; - code = gimple_assign_rhs_code (def); - switch (code) - { - case PLUS_EXPR: - case MINUS_EXPR: - break; - default: - return true; - } - if (TREE_CODE (gimple_assign_rhs1 (def)) != SSA_NAME - || TREE_CODE (gimple_assign_rhs2 (def)) != REAL_CST) - return true; - - cst1 = gimple_assign_rhs2 (def); - - phi = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (gimple_assign_rhs1 (def))); - if (!phi) - return true; - } - - tree cst2 = NULL_TREE; - int n = gimple_phi_num_args (phi); - for (int i = 0; i < n; i++) - { - tree arg = PHI_ARG_DEF (phi, i); - if (TREE_CODE (arg) != REAL_CST) - continue; - else if (cst2 == NULL_TREE) - cst2 = arg; - else if (!operand_equal_p (cst2, arg, 0)) - return true; - } - - if (cst1 && cst2) - cst2 = const_binop (code, TREE_TYPE (cst2), cst2, cst1); - if (cst2 - && TREE_CODE (cst2) == REAL_CST - && real_isinteger (TREE_REAL_CST_PTR (cst2), - TYPE_MODE (TREE_TYPE (cst2)))) - return false; - return true; -} diff --git a/gcc/splitter.cc b/gcc/splitter.cc new file mode 100644 index 00000000000..528d814b9a9 --- /dev/null +++ b/gcc/splitter.cc @@ -0,0 +1,357 @@ +#include <sstream> +#include <string> +#include <fstream> +#include <vector> +#include <map> +#include <algorithm> + +#include <iostream> +#include <list> +#include <stack> + +#define NIL -1 + +using namespace std; + +// A class that represents an directed graph +class Graph +{ + int V; // No. of vertices + list<int> *adj; // A dynamic array of adjacency lists + + // A Recursive DFS based function used by SCC() + void SCCUtil(int u, int disc[], int low[], + stack<int> *st, bool stackMember[]); +public: + Graph(int V); // Constructor + void addEdge(int v, int w); // function to add an edge to graph + void SCC(); // prints strongly connected components + + vector<vector<int>> components; +}; + +Graph::Graph(int V) +{ + this->V = V; + adj = new list<int>[V]; +} + +void Graph::addEdge(int v, int w) +{ + adj[v].push_back(w); +} + +// A recursive function that finds and prints strongly connected +// components using DFS traversal +// u --> The vertex to be visited next +// disc[] --> Stores discovery times of visited vertices +// low[] -- >> earliest visited vertex (the vertex with minimum +// discovery time) that can be reached from subtree +// rooted with current vertex +// *st -- >> To store all the connected ancestors (could be part +// of SCC) +// stackMember[] --> bit/index array for faster check whether +// a node is in stack +void Graph::SCCUtil(int u, int disc[], int low[], stack<int> *st, + bool stackMember[]) +{ + // A static variable is used for simplicity, we can avoid use + // of static variable by passing a pointer. + static int time = 0; + + // Initialize discovery time and low value + disc[u] = low[u] = ++time; + st->push(u); + stackMember[u] = true; + + // Go through all vertices adjacent to this + list<int>::iterator i; + for (i = adj[u].begin(); i != adj[u].end(); ++i) + { + int v = *i; // v is current adjacent of 'u' + + // If v is not visited yet, then recur for it + if (disc[v] == -1) + { + SCCUtil(v, disc, low, st, stackMember); + + // Check if the subtree rooted with 'v' has a + // connection to one of the ancestors of 'u' + // Case 1 (per above discussion on Disc and Low value) + low[u] = min(low[u], low[v]); + } + + // Update low value of 'u' only of 'v' is still in stack + // (i.e. it's a back edge, not cross edge). + // Case 2 (per above discussion on Disc and Low value) + else if (stackMember[v] == true) + low[u] = min(low[u], disc[v]); + } + + // head node found, pop the stack and print an SCC + int w = 0; // To store stack extracted vertices + if (low[u] == disc[u]) + { + vector<int> component; + while (st->top() != u) + { + w = (int) st->top(); + component.push_back (w); + stackMember[w] = false; + st->pop(); + } + w = (int) st->top(); + component.push_back (w); + components.push_back (component); + stackMember[w] = false; + st->pop(); + } +} + +// The function to do DFS traversal. It uses SCCUtil() +void Graph::SCC() +{ + int *disc = new int[V]; + int *low = new int[V]; + bool *stackMember = new bool[V]; + stack<int> *st = new stack<int>(); + + // Initialize disc and low, and stackMember arrays + for (int i = 0; i < V; i++) + { + disc[i] = NIL; + low[i] = NIL; + stackMember[i] = false; + } + + // Call the recursive helper function to find strongly + // connected components in DFS tree with vertex 'i' + for (int i = 0; i < V; i++) + if (disc[i] == NIL) + SCCUtil(i, disc, low, st, stackMember); +} + +using namespace std; + +struct function_entry +{ + function_entry (string name, unsigned lineno_start, unsigned id): + m_name (name), m_lineno_start (lineno_start), m_id (id), + m_callees () + {} + + unsigned get_loc () + { + return m_lineno_end - m_lineno_start; + } + + string m_name; + unsigned m_lineno_start; + unsigned m_lineno_end; + unsigned m_id; + vector<unsigned> m_callees; +}; + +map<string, unsigned> fn_to_index_map; +vector<function_entry *> functions; +vector<string> lines; + +static unsigned +get_id_for_fname (string fname) +{ + map<string, unsigned>::iterator it = fn_to_index_map.find (fname); + if (it != fn_to_index_map.end ()) + return it->second; + else + { + unsigned id = fn_to_index_map.size (); + fn_to_index_map[fname] = id; + return id; + } +} + +struct function_component +{ + function_component (vector<int> function_ids): + m_function_ids (function_ids) + {} + + void print() + { + for (unsigned i = 0; i < m_function_ids.size (); i++) + printf ("%s ", functions[m_function_ids[i]]->m_name.c_str ()); + printf ("\n"); + } + + unsigned get_total_loc () + { + unsigned loc = 0; + for (unsigned i = 0; i < m_function_ids.size (); i++) + loc += functions[m_function_ids[i]]->get_loc (); + return loc; + } + + void write (ofstream &s) + { + sort (m_function_ids.begin (), m_function_ids.end ()); + for (unsigned i = 0; i < m_function_ids.size (); i++) + { + function_entry *f = functions[m_function_ids[i]]; + for (unsigned j = f->m_lineno_start; j <= f->m_lineno_end; j++) + s << lines[j] << endl; + s << endl; + } + } + + vector<int> m_function_ids; +}; + +bool component_size_cmp (function_component *a, function_component *b) +{ + return a->get_total_loc () < b->get_total_loc (); +} + +vector<function_component *> components; + +int +main (int argc, char **argv) +{ + if (argc != 2) + return -1; + + string type (argv[1]); + + ifstream infile(type + "-match.c"); + ofstream header(type + "-match-header.c"); + ofstream footer(type + "-match-part-footer.c"); + footer << "#include \"" << type << "-match-header.c\"" << endl; + + string line; + unsigned lineno = 0; + bool in_split = false; + bool header_done = false; + + while (getline(infile, line)) + { + string fnbegin ("// split-fn-begin:"); + string fnend ("// split-fn-end"); + string call ("// call-fn:"); + lines.push_back (line); + + if (line.find(fnbegin) != string::npos) + { + in_split = true; + header_done = true; + string fname = line.substr (fnbegin.length ()); + functions.push_back (new function_entry (fname, lineno, + get_id_for_fname (fname))); + } + else if (line.find (fnend) != string::npos) + { + functions.back ()->m_lineno_end = lineno; + in_split = false; + } + else if (line.find (call) != string::npos) + { + string fname = line.substr (call.length ()); + functions.back ()->m_callees.push_back (get_id_for_fname (fname)); + } + else if (!in_split && !line.empty ()) + { + if (header_done) + footer << line << endl; + else + header << line << endl; + } + + lineno++; + } + + /* + for (unsigned i = 0; i < functions.size (); i++) + { + function_entry *f = functions[i]; + fprintf (stderr, "%d %s: %d\n", + f->m_lineno_end - f->m_lineno_start, + f->m_name.c_str (), f->m_callees.size ()); + } + */ + + /* Create graph and compute SCC. */ + Graph g (functions.size ()); + for (unsigned i = 0; i < functions.size (); i++) + { + function_entry *f = functions[i]; + for (unsigned j = 0; j < f->m_callees.size (); j++) + { + g.addEdge (f->m_id, f->m_callees[j]); + g.addEdge (f->m_callees[j], f->m_id); + } + } + g.SCC (); + + for (unsigned i = 0; i < g.components.size (); i++) + components.push_back (new function_component(g.components[i])); + + sort (components.begin (), components.end (), component_size_cmp); + + unsigned total_loc = 0; + for (unsigned i = 0; i < components.size (); i++) + { +// components[i]->print(); + total_loc += components[i]->get_total_loc (); + } + + printf ("Total # of functions: %ld, total LOC: %u\n", functions.size (), + total_loc); + + vector<vector<function_component *>> groups; + + unsigned parts = 4; + unsigned component_size = total_loc / parts; + if (components.back ()->get_total_loc () > component_size) + component_size = components.back ()->get_total_loc (); + + for (unsigned i = 0; i < parts; i++) + { + unsigned space = component_size; + vector<function_component *> group; + for (int j = components.size () - 1; j >= 0; j--) + { + function_component *c = components[j]; + unsigned loc = c->get_total_loc (); + if (space >= loc || (i == parts - 1)) + { +// fprintf (stderr, "adding %d to group %d\n", loc, i); + components.erase (components.begin () + j); + space -= loc; + group.push_back (c); + } + } + + groups.push_back (group); + } + + for (unsigned i = 0; i < groups.size (); i++) + { + string name = type + "-match-part-" + std::to_string (i) + ".c"; + ofstream s (name); + s << "#include \"" << type << "-match-header.c\"" << endl; + + unsigned loc = 0; + for (unsigned j = 0; j < groups[i].size (); j++) + { + function_component *c = groups[i][j]; + loc += c->get_total_loc (); + c->write (s); + } + s.close (); + + fprintf (stderr, "written %d LOC functions to %s\n", loc, name.c_str ()); + } + + header.close (); + footer.close (); + + return 0; +} -- 2.18.0