On Fri, 2017-10-06 at 14:25 +0200, Martin Liška wrote: > Hello. > > As presented at this year's Cauldron, I rewrote current switch > expansion to support > multiple algorithms (jump table and bit-test) that can be used just > for a fraction > of cases. Balanced decision tree is built on top of that. I decided > to do a bigger > refactoring and put all there 3 mentioned algorithm to its own class. > > There's a bigger change in jump_table_cluster::can_be_handled where > the constant 10 (3 respectively) > is compared to number of handled values, and not number of cases. > Later one is wrong in my opinion. > > There are some numbers for cc1plus: > > $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus > VM SIZE FILE SIZE > ++++++++++++++ GROWING ++++++++++++++ > +19% +1.32Mi .rodata +1.32Mi +19% > [ = ] 0 .symtab +7.38Ki +0.5% > [ = ] 0 .strtab +5.18Ki +0.2% > [ = ] 0 .debug_info +743 +0.0% > +0.0% +712 .eh_frame +712 +0.0% > +0.1% +440 .eh_frame_hdr +440 +0.1% > [ = ] 0 .debug_aranges +80 +0.1% > +0.0% +67 .dynstr +67 +0.0% > [ = ] 0 .debug_str +6 +0.0% > > -------------- SHRINKING -------------- > -1.2% -214Ki .text -214Ki -1.2% > [ = ] 0 .debug_loc -66.7Ki -0.1% > [ = ] 0 .debug_line -14.0Ki -0.2% > [ = ] 0 .debug_ranges -9.56Ki -0.1% > -6.8% -3 [Unmapped] -375 -14.4% > [ = ] 0 .debug_abbrev -46 -0.0% > > +3.8% +1.11Mi TOTAL +1.03Mi +0.5% > > So it growth and can be easily explained: > > insn-attrtab.o: > VM SIZE FILE SIZE > ++++++++++++++ GROWING ++++++++++++++ > [ = ] 0 .rela.rodata +2.00Mi +215% > +214% +682Ki .rodata +682Ki +214% > [ = ] 0 .rela.debug_loc +29.3Ki +36% > +32% +1.91Ki .eh_frame +1.91Ki +32% > [ = ] 0 .debug_loc +37 +5.6% > [ = ] 0 .debug_str +2 +0.0% > > -------------- SHRINKING -------------- > -50.1% -63.3Ki .text -63.3Ki -50.1% > [ = ] 0 .debug_line -1.71Ki -10.4% > [ = ] 0 .rela.debug_info -768 -0.3% > [ = ] 0 .rela.text -624 -0.8% > [ = ] 0 .rela.debug_ranges -384 -2.7% > [ = ] 0 .debug_info -87 -0.2% > [ = ] 0 [Unmapped] -2 -8.7% > > +137% +621Ki TOTAL +2.63Mi +139% > > It's caused by: > ... > ;; GIMPLE switch case clusters: JT(2710):-1-5090 > ;; GIMPLE switch case clusters: JT(2710):-1-5090 > ;; GIMPLE switch case clusters: JT(2967):-1-5090 > ;; GIMPLE switch case clusters: JT(1033):-1-5017 > ... > > so there are many switch statements with very reasonable density. > > and > insn-dfatab.o: +13% +99.4Ki TOTAL +455Ki +14% > insn-latencytab.o: +19% +136Ki TOTAL +609Ki +20% > > There shouldn't be any fallout other from what I mentioned in > previous email that is > a prerequisite for this patch. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression > tests. > > I'm still planning to come with some numbers and stats. Will do that > next week. > > Martin
> gcc/ChangeLog: > > 2017-09-29 Martin Liska <mli...@suse.cz> > > * tree-switch-conversion.c (MAX_CASE_BIT_TESTS): Remove and move > to header file. > (hoist_edge_and_branch_if_true): Move to ... > (bit_test_cluster::hoist_edge_and_branch_if_true): ... this. I'm not a reviewer, but there's a lot of "churn" in the patch due to things moving around; there seems to be a mixture of things simply moving around, functions becoming methods of classes, and some things changing. Would it be helpful to split the patch into two: a patch that moves the functions to where they'll be needed, and then a patch to do the "actual" (or functional) changes? [...snip...] > diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h > new file mode 100644 > index 00000000000..45ae11f408d > --- /dev/null > +++ b/gcc/tree-switch-conversion.h > @@ -0,0 +1,806 @@ > +/* Tree switch conversion for GNU compiler. > + Copyright (C) 2017 Free Software Foundation, Inc. Should this be "gimple switch conversion" and gimple-ssa-switch-conversion.h? (but presumably this is to mirror tree-switch-conversion.c, and I don't want to advocate for unnecessary churn) > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef TREE_SWITCH_CONVERSION_H > +#define TREE_SWITCH_CONVERSION_H > + > +namespace tree_switch_conversion { > + > +/* Type of cluster. */ > + > +enum cluster_type > +{ > + SIMPLE_CASE, > + JUMP_TABLE, > + BIT_TEST > +}; > + > +#define PRINT_CASE(f,c) print_dec (c, f, TYPE_SIGN (TREE_TYPE (c))) > + > +/* Base class for switch clustering. */ Maybe "Abstract base class for representing a cluster of cases", or somesuch? Perhaps an ASCII art inheritance diagram; if I'm reading the code right, something like: /* Abstract base class for representing a cluster of cases. Here is the inheritance hierarachy, and the enum_cluster_type values for the concrete subclasses: cluster |-simple_cluster (SIMPLE_CASE) `-group_cluster |-jump_table_cluster (JUMP_TABLE) `-bit_test_cluster (BIT_TEST). */ Am I right in thinking that the switch initialization conversion (struct switch_conversion) is currently distinct from the clustering in this patch? (I'm wondering if there's possible benefit from combining the two approaches, maybe if one cluster of cases predominantly assigns to some variables, and another cluster of cases predominantly assigns to some other variables). > + > +struct cluster > +{ > + /* Cosntructor. */ ^^^ Typo > + cluster () > + {} When is this default ctor used? Doesn't it leave the various fields uninitialized? I notice that of the descendant classes, at least jump_table_cluster doesn't call its base class ctor (if I'm reading the patch right), so I believe these fields are uninitialized for some of the subclasses. > + > + /* Constructor. */ > + cluster (tree case_label_expr, basic_block case_bb, profile_probability > prob, > + profile_probability subtree_prob); > + [...snip...] > +struct simple_cluster: public cluster > +{ Could this struct have a descriptive comment? e.g. /* Subclass of cluster representing a simple contiguous range from [low..high]. */ or somesuch (assuming I've read the intent correctly) > + /* Constructor. */ > + simple_cluster (tree low, tree high, tree case_label_expr, > + basic_block case_bb, profile_probability prob); > + > + /* Destructor. */ > + virtual ~simple_cluster () > + {} > + > + virtual cluster_type > + get_type () > + { > + return SIMPLE_CASE; > + } I believe that the "virtual" here is redundant, but that it ought to say get_type () OVERRIDE FINAL so tell human readers that it's a vfunc override, and so that C++11 onwards can warn if it's not actually a vfunc override. > + virtual tree > + get_low () > + { > + return m_low; > + } > + > + virtual tree > + get_high () > + { > + return m_high; > + } > + > + virtual void > + debug () > + { > + dump (stderr); > + } > + > + virtual void > + dump (FILE *f) > + { > + PRINT_CASE (f, get_low ()); > + if (get_low () != get_high ()) > + { > + fprintf (f, "-"); > + PRINT_CASE (f, get_high ()); > + } > + fprintf (f, " "); > + } Likewise for all of these, I believe. > + /* Low value of the case. */ > + tree m_low; > + > + /* High value of the case. */ > + tree m_high; > +}; > + > +simple_cluster::simple_cluster (tree low, tree high, tree case_label_expr, > + basic_block case_bb, profile_probability prob): > + cluster (case_label_expr, case_bb, prob, prob), > + m_low (low), m_high (high) > +{ > +} > +struct group_cluster: public cluster > +{ Again, could this subclass have a descriptive comment, something like. /* Abstract subclass of cluster, handling a collection of simple_cluster instances. */ or somesuch (I believe it's abstract because get_type isn't implemented). > + /* Destructor. */ > + virtual ~group_cluster (); > + > + virtual tree > + get_low () > + { > + return m_cases[0]->get_low (); > + } > + > + virtual tree > + get_high () > + { > + return m_cases[m_cases.length () - 1]->get_high (); > + } > + > + virtual void > + debug () > + { > + dump (stderr); > + } > + > + virtual void > + dump (FILE *f) > + { > + unsigned total_values = 0; > + for (unsigned i = 0; i < m_cases.length (); i++) > + total_values += m_cases[i]->get_range (m_cases[i]->get_low (), > + m_cases[i]->get_high ()); > + > + fprintf (f, "%s(%d):", get_type () == JUMP_TABLE ? "JT" : "BT", > + total_values); > + PRINT_CASE (f, get_low ()); > + fprintf (f, "-"); > + PRINT_CASE (f, get_high ()); > + fprintf (f, " "); > + } As before, lose the "virtual" and add OVERRIDE FINAL for all of these, I believe. > + > + /* List of simple clusters handled by the group. */ > + vec<simple_cluster *> m_cases; > +}; > + > +struct jump_table_cluster: public group_cluster Suggested descriptive comment: /* Concrete subclass of group_cluster representing a collection of cases to be implemented as a jump table. The "emit" vfunc gernerates a nested switch statement which is later lowered to a jump table. */ or somesuch. > +{ > + /* Constructor. */ > + jump_table_cluster (vec<cluster *> &clusters, unsigned start, unsigned > end); > + > + virtual cluster_type > + get_type () > + { > + return JUMP_TABLE; > + } > + > + virtual void emit (tree index_expr, tree index_type, > + tree default_label_expr, basic_block default_bb); As before, lose the "virtual" and add OVERRIDE FINAL for both of these (I think it matters much for "emit" given the non-trivial signature and the empty default implementation). > + /* Find jump tables of given CLUSTERS, where all members of the vector > + are of type simple_cluster. New clusters are returned. */ > + static vec<cluster *> find_jump_tables (vec<cluster *> &clusters); > + > + /* Return true when cluster starting at START and ending at END (inclusive) > + can build a jump-table. */ > + static bool can_be_handled (const vec<cluster *> &clusters, unsigned start, > + unsigned end); > + > + /* Return true if cluster starting at START and ending at END (inclusive) > + is profitable transformation. */ > + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start, > + unsigned end); > + > + /* Return the smallest number of different values for which it is best > + to use a jump-table instead of a tree of conditional branches. */ > + static inline unsigned int case_values_threshold (void); > +}; > + > +/* A GIMPLE switch statement can be expanded to a short sequence of bit-wise > +comparisons. "switch(x)" is converted into "if ((1 << (x-MINVAL)) & CST)" > +where CST and MINVAL are integer constants. This is better than a series > +of compare-and-banch insns in some cases, e.g. we can implement: > + > + if ((x==4) || (x==6) || (x==9) || (x==11)) > + > +as a single bit test: > + > + if ((1<<x) & ((1<<4)|(1<<6)|(1<<9)|(1<<11))) > + > +This transformation is only applied if the number of case targets is small, > +if CST constains at least 3 bits, and "1 << x" is cheap. The bit tests are > +performed in "word_mode". > + > +The following example shows the code the transformation generates: > + > + int bar(int x) > + { > + switch (x) > + { > + case '0': case '1': case '2': case '3': case '4': > + case '5': case '6': case '7': case '8': case '9': > + case 'A': case 'B': case 'C': case 'D': case 'E': > + case 'F': > + return 1; > + } > + return 0; > + } > + > +==> > + > + bar (int x) > + { > + tmp1 = x - 48; > + if (tmp1 > (70 - 48)) goto L2; > + tmp2 = 1 << tmp1; > + tmp3 = 0b11111100000001111111111; > + if ((tmp2 & tmp3) != 0) goto L1 ; else goto L2; > + L1: > + return 1; > + L2: > + return 0; > + } > + > +TODO: There are still some improvements to this transformation that could > +be implemented: > + > +* A narrower mode than word_mode could be used if that is cheaper, e.g. > + for x86_64 where a narrower-mode shift may result in smaller code. > + > +* The compounded constant could be shifted rather than the one. The > + test would be either on the sign bit or on the least significant bit, > + depending on the direction of the shift. On some machines, the test > + for the branch would be free if the bit to test is already set by the > + shift operation. > + > +This transformation was contributed by Roger Sayle, see this e-mail: > + http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html > +*/ > + > +struct bit_test_cluster: public group_cluster > +{ > + /* Constructor. */ > + bit_test_cluster (vec<cluster *> &clusters, unsigned start, unsigned end); > + > + virtual cluster_type > + get_type () > + { > + return BIT_TEST; > + } > + > +/* Expand a switch statement by a short sequence of bit-wise > + comparisons. "switch(x)" is effectively converted into > + "if ((1 << (x-MINVAL)) & CST)" where CST and MINVAL are > + integer constants. > + > + INDEX_EXPR is the value being switched on. > + > + MINVAL is the lowest case value of in the case nodes, > + and RANGE is highest value minus MINVAL. MINVAL and RANGE > + are not guaranteed to be of the same type as INDEX_EXPR > + (the gimplifier doesn't change the type of case label values, > + and MINVAL and RANGE are derived from those values). > + MAXVAL is MINVAL + RANGE. > + > + There *MUST* be max_case_bit_tests or less unique case > + node targets. */ > + virtual void emit (tree index_expr, tree index_type, > + tree default_label_expr, basic_block default_bb); As before, lose the "virtual" and add OVERRIDE FINAL for both of these. > + /* Find bit tests of given CLUSTERS, where all members of the vector > + are of type simple_cluster. New clusters are returned. */ > + static vec<cluster *> find_bit_tests (vec<cluster *> &clusters); > + > + /* Return true when cluster starting at START and ending at END (inclusive) > + can build a bit test. */ > + static bool can_be_handled (const vec<cluster *> &clusters, unsigned start, > + unsigned end); > + > + /* Return true if cluster starting at START and ending at END (inclusive) > + is profitable transformation. */ > + static bool is_beneficial (const vec<cluster *> &clusters, unsigned start, > + unsigned end); > + > +/* Split the basic block at the statement pointed to by GSIP, and insert > + a branch to the target basic block of E_TRUE conditional on tree > + expression COND. > + > + It is assumed that there is already an edge from the to-be-split > + basic block to E_TRUE->dest block. This edge is removed, and the > + profile information on the edge is re-used for the new conditional > + jump. > + > + The CFG is updated. The dominator tree will not be valid after > + this transformation, but the immediate dominators are updated if > + UPDATE_DOMINATORS is true. > + > + Returns the newly created basic block. */ > + static basic_block hoist_edge_and_branch_if_true (gimple_stmt_iterator > *gsip, > + tree cond, > + basic_block case_bb); > + > + /* Maximum number of different basic blocks that can be handlel by Typo: handlel -> handled > + a bit test. */ > + static const int m_max_case_bit_tests = 3; > +}; > + [...snip...] Hope this is constructive Dave