> Am 09.12.2024 um 16:19 schrieb Filip Kastl <fka...@suse.cz>:
>
> Hi Richi,
>
> This is the second version of the patch. I've lowered the default value of
> the
> switch-lower-slow-alg-max-cases from 10000 to 1000. I've also noticed that I
> forgot to add an entry into the param section of doc/invoke.texi so I fixed
> that.
>
> I'm bootstraping and regtesting this again just to be extra sure. Ok to push
> if bootstrap/regtest succeeds?
Ok
Thanks,
Richard
> Cheers,
> Filip Kastl
>
>
> -- 8< --
>
>
> This patch adds a limit on the number of cases of a switch. When this
> limit is exceeded, switch lowering decides to use faster but less
> powerful algorithms.
>
> In particular this means that for finding bit tests switch lowering
> decides between the old dynamic programming O(n^2) algorithm and the
> new greedy algorithm that Andi Kleen recently added but then reverted
> due to PR117352. It also means that switch lowering may bail out on
> finding jump tables if the switch is too large (Btw it also may not
> bail! It can happen that the greedy algorithms finds some bit tests
> which then basically split the switch into multiple smaller switches and
> those may be small enough to fit under the limit.)
>
> The limit is implemented as --param switch-lower-slow-alg-max-cases.
> Exceeding the limit is reported through -Wdisabled-optimization.
>
> This patch fixes the issue with the greedy algorithm described in
> PR117352. The problem was incorrect usage of the is_beneficial()
> heuristic.
>
> gcc/ChangeLog:
>
> PR middle-end/117091
> PR middle-end/117352
> * doc/invoke.texi: Add switch-lower-slow-alg-max-cases.
> * params.opt: Add switch-lower-slow-alg-max-cases.
> * tree-switch-conversion.cc (jump_table_cluster::find_jump_tables):
> Note in a comment that we are looking for jump tables in
> case sequences delimited by the already found bit tests.
> (bit_test_cluster::find_bit_tests): Decide between
> find_bit_tests_fast() and find_bit_tests_slow().
> (bit_test_cluster::find_bit_tests_fast): New function.
> (bit_test_cluster::find_bit_tests_slow): New function.
> (switch_decision_tree::analyze_switch_statement): Report
> exceeding the limit.
> * tree-switch-conversion.h: Add find_bit_tests_fast() and
> find_bit_tests_slow().
>
> Co-Authored-By: Andi Kleen <a...@gcc.gnu.org>
> Signed-off-by: Filip Kastl <fka...@suse.cz>
> ---
> gcc/doc/invoke.texi | 3 +
> gcc/params.opt | 4 ++
> gcc/tree-switch-conversion.cc | 112 +++++++++++++++++++++++++++++++---
> gcc/tree-switch-conversion.h | 18 ++++++
> 4 files changed, 130 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 14afd1934bd..3cb9a50b690 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16500,6 +16500,9 @@ Switch initialization conversion refuses to create
> arrays that are
> bigger than @option{switch-conversion-max-branch-ratio} times the number of
> branches in the switch.
>
> +@item switch-lower-slow-alg-max-cases
> +Maximum number of cases for slow switch lowering algorithms to be used.
> +
> @item max-partial-antic-length
> Maximum length of the partial antic set computed during the tree
> partial redundancy elimination optimization (@option{-ftree-pre}) when
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 5853bf02f9e..1c88d5212c4 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1052,6 +1052,10 @@ Maximum number of instruction distance that a small
> store forwarded to a larger
> Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8)
> IntegerRange(1, 65536) Param Optimization
> The maximum ratio between array size and switch branches for a switch
> conversion to take place.
>
> +-param=switch-lower-slow-alg-max-cases=
> +Common Joined UInteger Var(param_switch_lower_slow_alg_max_cases) Init(1000)
> IntegerRange(1, 1000000000) Param Optimization
> +Maximum number of cases for slow switch lowering algorithms to be used.
> +
> -param=modref-max-bases=
> Common Joined UInteger Var(param_modref_max_bases) Init(32) Param Optimization
> Maximum number of bases stored in each modref tree.
> diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
> index 3436c2a8b98..b98e70cf7d1 100644
> --- a/gcc/tree-switch-conversion.cc
> +++ b/gcc/tree-switch-conversion.cc
> @@ -54,6 +54,7 @@ Software Foundation, 51 Franklin Street, Fifth Floor,
> Boston, MA
> #include "tree-cfgcleanup.h"
> #include "hwint.h"
> #include "internal-fn.h"
> +#include "diagnostic-core.h"
>
> /* ??? For lang_hooks.types.type_for_mode, but is there a word_mode
> type in the GIMPLE type system that is language-independent? */
> @@ -1641,6 +1642,11 @@ jump_table_cluster::find_jump_tables (vec<cluster *>
> &clusters)
> return clusters.copy ();
>
> unsigned l = clusters.length ();
> +
> + /* Note: l + 1 is the number of cases of the switch. */
> + if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
> + return clusters.copy ();
> +
> auto_vec<min_cluster_item> min;
> min.reserve (l + 1);
>
> @@ -1771,16 +1777,80 @@ jump_table_cluster::is_beneficial (const vec<cluster
> *> &,
> return end - start + 1 >= case_values_threshold ();
> }
>
> -/* Find bit tests of given CLUSTERS, where all members of the vector
> - are of type simple_cluster. MAX_C is the approx max number of cases per
> - label. New clusters are returned. */
> +/* Find bit tests of given CLUSTERS, where all members of the vector are of
> + type simple_cluster. Use a fast algorithm that might not find the optimal
> + solution (minimal number of clusters on the output). New clusters are
> + returned.
> +
> + You should call find_bit_tests () instead of calling this function
> + directly. */
>
> vec<cluster *>
> -bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
> +bit_test_cluster::find_bit_tests_fast (vec<cluster *> &clusters)
> {
> - if (!is_enabled () || max_c == 1)
> - return clusters.copy ();
> + unsigned l = clusters.length ();
> + vec<cluster *> output;
> +
> + output.create (l);
> +
> + /* Look at sliding BITS_PER_WORD sized windows in the switch value space
> + and determine if they are suitable for a bit test cluster. Worst case
> + this can examine every value BITS_PER_WORD-1 times. */
> + unsigned k;
> + for (unsigned i = 0; i < l; i += k)
> + {
> + hash_set<basic_block> targets;
> + cluster *start_cluster = clusters[i];
> +
> + /* Find the biggest k such that clusters i to i+k-1 can be turned into
> a
> + one big bit test cluster. */
> + k = 0;
> + while (i + k < l)
> + {
> + cluster *end_cluster = clusters[i + k];
> +
> + /* Does value range fit into the BITS_PER_WORD window? */
> + HOST_WIDE_INT w = cluster::get_range (start_cluster->get_low (),
> + end_cluster->get_high ());
> + if (w == 0 || w > BITS_PER_WORD)
> + break;
> +
> + /* Check for max # of targets. */
> + if (targets.elements () == m_max_case_bit_tests
> + && !targets.contains (end_cluster->m_case_bb))
> + break;
> +
> + targets.add (end_cluster->m_case_bb);
> + k++;
> + }
>
> + if (is_beneficial (k, targets.elements ()))
> + {
> + output.safe_push (new bit_test_cluster (clusters, i, i + k - 1,
> + i == 0 && k == l));
> + }
> + else
> + {
> + output.safe_push (clusters[i]);
> + /* ??? Might be able to skip more. */
> + k = 1;
> + }
> + }
> +
> + return output;
> +}
> +
> +/* Find bit tests of given CLUSTERS, where all members of the vector
> + are of type simple_cluster. Use a slow (quadratic) algorithm that always
> + finds the optimal solution (minimal number of clusters on the output).
> New
> + clusters are returned.
> +
> + You should call find_bit_tests () instead of calling this function
> + directly. */
> +
> +vec<cluster *>
> +bit_test_cluster::find_bit_tests_slow (vec<cluster *> &clusters)
> +{
> unsigned l = clusters.length ();
> auto_vec<min_cluster_item> min;
> min.reserve (l + 1);
> @@ -1834,6 +1904,25 @@ bit_test_cluster::find_bit_tests (vec<cluster *>
> &clusters, int max_c)
> return output;
> }
>
> +/* Find bit tests of given CLUSTERS, where all members of the vector
> + are of type simple_cluster. MAX_C is the approx max number of cases per
> + label. New clusters are returned. */
> +
> +vec<cluster *>
> +bit_test_cluster::find_bit_tests (vec<cluster *> &clusters, int max_c)
> +{
> + if (!is_enabled () || max_c == 1)
> + return clusters.copy ();
> +
> + unsigned l = clusters.length ();
> +
> + /* Note: l + 1 is the number of cases of the switch. */
> + if (l + 1 > (unsigned) param_switch_lower_slow_alg_max_cases)
> + return find_bit_tests_fast (clusters);
> + else
> + return find_bit_tests_slow (clusters);
> +}
> +
> /* Return true when RANGE of case values with UNIQ labels
> can build a bit test. */
>
> @@ -2264,10 +2353,19 @@ switch_decision_tree::analyze_switch_statement ()
>
> reset_out_edges_aux (m_switch);
>
> + if (l > (unsigned) param_switch_lower_slow_alg_max_cases)
> + warning_at (gimple_location (m_switch), OPT_Wdisabled_optimization,
> + "Using faster switch lowering algorithms. "
> + "Number of switch cases (%d) exceeds "
> + "%<--param=switch-lower-slow-alg-max-cases=%d%> limit.",
> + l, param_switch_lower_slow_alg_max_cases);
> +
> /* Find bit-test clusters. */
> vec<cluster *> output = bit_test_cluster::find_bit_tests (clusters, max_c);
>
> - /* Find jump table clusters. */
> + /* Find jump table clusters. We are looking for these in the sequences of
> + simple clusters which we didn't manage to convert into bit-test
> + clusters. */
> vec<cluster *> output2;
> auto_vec<cluster *> tmp;
> output2.create (1);
> diff --git a/gcc/tree-switch-conversion.h b/gcc/tree-switch-conversion.h
> index e6a85fa6025..560620903a8 100644
> --- a/gcc/tree-switch-conversion.h
> +++ b/gcc/tree-switch-conversion.h
> @@ -397,6 +397,24 @@ public:
> tree default_label_expr, basic_block default_bb, location_t loc)
> final override;
>
> + /* Find bit tests of given CLUSTERS, where all members of the vector are of
> + type simple_cluster. Use a fast algorithm that might not find the
> optimal
> + solution (minimal number of clusters on the output). New clusters are
> + returned.
> +
> + You should call find_bit_tests () instead of calling this function
> + directly. */
> + static vec<cluster *> find_bit_tests_fast (vec<cluster *> &clusters);
> +
> + /* Find bit tests of given CLUSTERS, where all members of the vector
> + are of type simple_cluster. Use a slow (quadratic) algorithm that
> always
> + finds the optimal solution (minimal number of clusters on the output).
> New
> + clusters are returned.
> +
> + You should call find_bit_tests () instead of calling this function
> + directly. */
> + static vec<cluster *> find_bit_tests_slow (vec<cluster *> &clusters);
> +
> /* Find bit tests of given CLUSTERS, where all members of the vector
> are of type simple_cluster. New clusters are returned. */
> static vec<cluster *> find_bit_tests (vec<cluster *> &clusters, int max_c);
> --
> 2.46.0
>