On Fri, 15 Nov 2024, Filip Kastl wrote:

> Hi,
> 
> Andi's greedy bit test finding algorithm was reverted.  I found a fix for the
> problem that caused the revert.  I made this patch to reintroduce the greedy
> alg into GCC.  However I think we should keep the old slow but more powerful
> algorithm so I added a limit on the number of cases of the switch statement
> that decides which of the two algorithms to use.
> 
> Bootstrapped and regtested on x86_64-linux.
> 
> I've also bootstrapped and regtested version of this patch where I force 
> switch
> lowering to only use the Andi's greedy algorithm (with my fix).  This also 
> went
> smoothly without any testsuite regression nor problems during bootstrap (which
> surprised me -- I expected some testcases to rely on bit test finding to be
> powerful but I double checked and really didn't find any problems).  OFC I've
> also checked that the greedy algorithm doesn't cause the problems from 
> PR117352
> anymore.
> 
> Ok to push?
> 
> Thanks,
> Filip Kastl
> 
> 
> -- 8< --
> 
> 
> This patch adds a limit on the number of cases of a switch.  When this
> limit is exceeded, switch lowering decides to use faster but less
> powerful algorithms.
> 
> In particular this means that for finding bit tests switch lowering
> decides between the old dynamic programming O(n^2) algorithm and the
> new greedy algorithm that Andi Kleen recently added but then reverted
> due to PR117352.  It also means that switch lowering may bail out on
> finding jump tables if the switch is too large  (Btw it also may not
> bail!  It can happen that the greedy algorithms finds some bit tests
> which then basically split the switch into multiple smaller switches and
> those may be small enough to fit under the limit.)
> 
> The limit is implemented as --param switch-lower-slow-alg-max-cases.
> Exceeding the limit is reported through -Wdisabled-optimization.
> 
> This patch fixes the issue with the greedy algorithm described in
> PR117352.  The problem was incorrect usage of the is_beneficial()
> heuristic.

Did you adjust the testcase to use the 10000 default of the param
and figure it's still fast enough with it?  If the alg is quadratic
10000^2 is quite large ;)

OK (besides that new --param default maybe).

Thanks,
Richard.

> gcc/ChangeLog:
> 
>       PR middle-end/117091
>       PR middle-end/117352
>       * params.opt: Add switch-lower-slow-alg-max-cases.
>       * tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
>       Note in a comment that we are looking for jump tables in
>       case sequences delimited by the already found bit tests.
>       (bit_test_cluster::find_bit_tests): Decide between
>       find_bit_tests_fast() and find_bit_tests_slow().
>       (bit_test_cluster::find_bit_tests_fast): New function.
>       (bit_test_cluster::find_bit_tests_slow): New function.
>       (switch_decision_tree::analyze_switch_statement): Report
>       exceeding the limit.
>       * tree-switch-conversion.h: Add find_bit_tests_fast() and
>       find_bit_tests_slow().
> 
> Co-Authored-By: Andi Kleen <a...@gcc.gnu.org>
> Signed-off-by: Filip Kastl <fka...@suse.cz>
> ---
>  gcc/params.opt                |   4 ++
>  gcc/tree-switch-conversion.cc | 112 +++++++++++++++++++++++++++++++---
>  gcc/tree-switch-conversion.h  |  18 ++++++
>  3 files changed, 127 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 7c572774df2..edd638565b5 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1044,6 +1044,10 @@ Maximum size of a single store merging region in bytes.
>  Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8) 
> IntegerRange(1, 65536) Param Optimization
>  The maximum ratio between array size and switch branches for a switch 
> conversion to take place.
>  
> +-param=switch-lower-slow-alg-max-cases=
> +Common Joined UInteger Var(param_switch_lower_slow_alg_max_cases) 
> Init(10000) IntegerRange(1, 1000000000) Param Optimization
> +Maximum number of cases for slow switch lowering algorithms to be used.
> +
>  -param=modref-max-bases=
>  Common Joined UInteger Var(param_modref_max_bases) Init(32) Param 
> Optimization
>  Maximum number of bases stored in each modref tree.
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 852419b2f4b..3c8cbce5d37 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -55,6 +55,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor, 
> Boston, MA
>  #include "tree-cfgcleanup.h"
>  #include "hwint.h"
>  #include "internal-fn.h"
> +#include "diagnostic-core.h"
>  
>  /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
>     type in the GIMPLE type system that is language-independent?  */
> @@ -1642,6 +1643,11 @@ jump_table_cluster::find_jump_tables (vec<cluster *> 
> &clusters)
>      return clusters.copy ();
>  
>    unsigned l = clusters.length ();
> +
> +  /* Note: l + 1 is the number of cases of the switch.  */
> +  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
> +    return clusters.copy ();
> +
>    auto_vec<min_cluster_item> min;
>    min.reserve (l + 1);
>  
> @@ -1772,16 +1778,80 @@ jump_table_cluster::is_beneficial (const vec<cluster 
> *> &,
>    return end - start + 1 >= case_values_threshold ();
>  }
>  
> -/* Find bit tests of given CLUSTERS, where all members of the vector
> -   are of type simple_cluster.   MAX_C is the approx max number of cases per
> -   label.  New clusters are returned.  */
> +/* Find bit tests of given CLUSTERS, where all members of the vector are of
> +   type simple_cluster.  Use a fast algorithm that might not find the optimal
> +   solution (minimal number of clusters on the output).  New clusters are
> +   returned.
> +
> +   You should call find_bit_tests () instead of calling this function
> +   directly.  */
>  
>  vec<cluster *>
> -bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
> +bit_test_cluster::find_bit_tests_fast (vec<cluster *> &clusters)
>  {
> -  if (!is_enabled () || max_c == 1)
> -    return clusters.copy ();
> +  unsigned l = clusters.length ();
> +  vec<cluster *> output;
> +
> +  output.create (l);
> +
> +  /* Look at sliding BITS_PER_WORD sized windows in the switch value space
> +     and determine if they are suitable for a bit test cluster.  Worst case
> +     this can examine every value BITS_PER_WORD-1 times.  */
> +  unsigned k;
> +  for (unsigned i = 0; i < l; i += k)
> +    {
> +      hash_set<basic_block> targets;
> +      cluster *start_cluster = clusters[i];
> +
> +      /* Find the biggest k such that clusters i to i+k-1 can be turned into 
> a
> +      one big bit test cluster.  */
> +      k = 0;
> +      while (i + k < l)
> +     {
> +       cluster *end_cluster = clusters[i + k];
> +
> +       /* Does value range fit into the BITS_PER_WORD window?  */
> +       HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
> +                                             end_cluster->get_high ());
> +       if (w == 0 || w > BITS_PER_WORD)
> +         break;
> +
> +       /* Check for max # of targets.  */
> +       if (targets.elements () == m_max_case_bit_tests
> +           && !targets.contains (end_cluster->m_case_bb))
> +         break;
> +
> +       targets.add (end_cluster->m_case_bb);
> +       k++;
> +     }
>  
> +      if (is_beneficial (k, targets.elements ()))
> +     {
> +       output.safe_push (new bit_test_cluster (clusters, i, i + k - 1,
> +                                               i == 0 && k == l));
> +     }
> +      else
> +     {
> +       output.safe_push (clusters[i]);
> +       /* ??? Might be able to skip more.  */
> +       k = 1;
> +     }
> +    }
> +
> +  return output;
> +}
> +
> +/* Find bit tests of given CLUSTERS, where all members of the vector
> +   are of type simple_cluster.  Use a slow (quadratic) algorithm that always
> +   finds the optimal solution (minimal number of clusters on the output).  
> New
> +   clusters are returned.
> +
> +   You should call find_bit_tests () instead of calling this function
> +   directly.  */
> +
> +vec<cluster *>
> +bit_test_cluster::find_bit_tests_slow (vec<cluster *> &clusters)
> +{
>    unsigned l = clusters.length ();
>    auto_vec<min_cluster_item> min;
>    min.reserve (l + 1);
> @@ -1835,6 +1905,25 @@ bit_test_cluster::find_bit_tests (vec<cluster *> 
> &clusters, int max_c)
>    return output;
>  }
>  
> +/* Find bit tests of given CLUSTERS, where all members of the vector
> +   are of type simple_cluster.  MAX_C is the approx max number of cases per
> +   label.  New clusters are returned.  */
> +
> +vec<cluster *>
> +bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
> +{
> +  if (!is_enabled () || max_c == 1)
> +    return clusters.copy ();
> +
> +  unsigned l = clusters.length ();
> +
> +  /* Note: l + 1 is the number of cases of the switch.  */
> +  if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
> +    return find_bit_tests_fast (clusters);
> +  else
> +    return find_bit_tests_slow (clusters);
> +}
> +
>  /* Return true when RANGE of case values with UNIQ labels
>     can build a bit test.  */
>  
> @@ -2265,10 +2354,19 @@ switch_decision_tree::analyze_switch_statement ()
>  
>    reset_out_edges_aux (m_switch);
>  
> +  if (l > (unsigned) param_switch_lower_slow_alg_max_cases)
> +    warning_at (gimple_location (m_switch), OPT_Wdisabled_optimization,
> +            "Using faster switch lowering algorithms. "
> +            "Number of switch cases (%d) exceeds "
> +            "%<--param=switch-lower-slow-alg-max-cases=%d%> limit.",
> +            l, param_switch_lower_slow_alg_max_cases);
> +
>    /* Find bit-test clusters.  */
>    vec<cluster *> output = bit_test_cluster::find_bit_tests (clusters, max_c);
>  
> -  /* Find jump table clusters.  */
> +  /* Find jump table clusters.  We are looking for these in the sequences of
> +     simple clusters which we didn't manage to convert into bit-test
> +     clusters.  */
>    vec<cluster *> output2;
>    auto_vec<cluster *> tmp;
>    output2.create (1);
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index e6a85fa6025..560620903a8 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -397,6 +397,24 @@ public:
>            tree default_label_expr, basic_block default_bb, location_t loc)
>       final override;
>  
> +  /* Find bit tests of given CLUSTERS, where all members of the vector are of
> +     type simple_cluster.  Use a fast algorithm that might not find the 
> optimal
> +     solution (minimal number of clusters on the output).  New clusters are
> +     returned.
> +
> +     You should call find_bit_tests () instead of calling this function
> +     directly.  */
> +  static vec<cluster *> find_bit_tests_fast (vec<cluster *> &clusters);
> +
> +  /* Find bit tests of given CLUSTERS, where all members of the vector
> +     are of type simple_cluster.  Use a slow (quadratic) algorithm that 
> always
> +     finds the optimal solution (minimal number of clusters on the output).  
> New
> +     clusters are returned.
> +
> +     You should call find_bit_tests () instead of calling this function
> +     directly.  */
> +  static vec<cluster *> find_bit_tests_slow (vec<cluster *> &clusters);
> +
>    /* Find bit tests of given CLUSTERS, where all members of the vector
>       are of type simple_cluster.  New clusters are returned.  */
>    static vec<cluster *> find_bit_tests (vec<cluster *> &clusters, int max_c);
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to