On Thu, Dec 9, 2021 at 2:02 PM Martin Liška <mli...@suse.cz> wrote:
>
> On 11/30/21 12:17, Richard Biener wrote:
> > I'd like to see the gswitch support - that's what was posted before stage3
> > close, this patch on its own doesn't seem worth pushing for.  That said,
> > I have some comments below (and the already raised ones about how
> > things might need to change with gswitch support).  Is it so difficult to
> > develop gswitch support as a separate change ontop of this?
>
> Hello.
>
> Took me some time, but I have a working version of the patch that makes both
> refactoring of the costing model and adds support for gswitch. For quite some
> time, I maintained 2 patches, but as commonly happens, I was forced doing 
> quite
> some rework. If we really want a separation for bisection purpose, I suggest 
> simple
> disabling of gswitch support?

It was really meant to ease review.  I'm now looking at the combined
patch, comments will
follow.

+static void
+clean_up_after_unswitching (const auto_edge_flag &ignored_edge_flag)
+{
+  basic_block bb;
+
...
+  /* Clean up the ignored_edge_flag from edges.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      edge e;

you can probably clean up outgoing edge flags in the loop above?

(I'm randomly jumping)

+             /* Build compound expression for all cases leading
+                to this edge.  */
+             tree expr = NULL_TREE;
+             for (unsigned i = 1; i < gimple_switch_num_labels (stmt); ++i)
+               {
+                 tree lab = gimple_switch_label (stmt, i);
+                 basic_block dest = label_to_block (cfun, CASE_LABEL (lab));
+                 edge e2 = find_edge (gimple_bb (stmt), dest);
+                 if (e == e2)

just as a style note I prefer if (e != e2) continue; so the following code
needs less indentation

+                         tree cmp1 = build2 (GE_EXPR, boolean_type_node, idx,
+                                             CASE_LOW (lab));

is there a reason you are not using fold_build2?  Do we want to somehow account
for the built expression size or maybe have a separate limit for the number of
cases we want to combine this way?

+                 unswitch_predicate *predicate
+                   = new unswitch_predicate (expr, idx, edge_index);
+                 ranger->gori ().outgoing_edge_range_p
(predicate->true_range, e,
+                                                        idx,
*get_global_range_query ());
+                 /* Huge switches are not supported by Ranger.  */
+                 if (predicate->true_range.undefined_p ())

I hope ranger will set the range to varying_p () in that case, not
undefined?  But even
then, is that a reason for punting?  I guess we fail to prune cases in
that case but
the cost modeling should then account for those and thus we are at
least consistent?

+                   {
+                     delete predicate;
+                     return;

In this context, when we do

+      if (operand_equal_p (predicate->lhs, last_predicate->lhs, 0))
+       {
+         irange &other = (true_edge ? predicate->merged_true_range
+                          : predicate->merged_false_range);
+         last_predicate->merged_true_range.intersect (other);
+         last_predicate->merged_false_range.intersect (other);
+         return;

ranger may be conservative when intersecting (and hopefully not end up
with undefined_p - heh).  I also am confused about intersecting both
merged_true_range and merged_false_range with 'other'?  I would
have expected to merge true edge info with true edge info and thus
only "swap" things somehow?  OTOH "path" suggests we're dealing
with more than one edge and associated ranges?  Maybe expanding
the comment on the predicate_vector would be useful.  AFAIR we
there store the sequence of unswitchings done with pairs of
the predicate unswitched on and a bool indicating whether we're
dealing with the copy that had the predicate true or the one that
had it false.

+  unswitch_predicate *predicate = NULL;
+  basic_block bb = NULL;
+  if (num > param_max_unswitch_level)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+                        "Not unswitching anymore, hit max level\n");
+      goto exit;
+    }

I'll notice that given we have the overall size budget limiting the number
of unswitchings itself is probably unnecessary (as noted above we might
need to account for the size of the unswitch condition).

+  for (unsigned i = 0; i != loop->num_nodes; i++)
+    {
+      vec<unswitch_predicate *> &predicates = get_predicates_for_bb (bbs[i]);
+      if (!predicates.is_empty ())
+       {

same comment about indenting

I wonder whether evaluate_control_stmt_using_entry_checks could set
ignored_edge_flag itself instead of communicating via a hash_set?

It's not exactly clear what we use pred->handled for, do we re-discover
and re-try predicates when unswitching another level otherwise?  But
we _do_ want to unswitch the same switch stmt again, no?  And since
the BB predicate is keyed on the switch stmt that wouldn't work?
I wonder whether on recursive invocations of tree_unswitch_single_loop
we'd want to skip over unreachable BBs in the loop when looking
for candidates (when we compute BB predicates we skip over them
but that's done before all cases).  Ah, the BB predicates are a vector,
I wonder why we not simply remove the predicate from the BB predicate
vector when we decide to unswitch on it and thus append it to a predicate
path?

Can you please amend the toplevel file comment as to the new flow of
things?

For the switch unswitching testcases I wonder if we can add dump scanning
to verify we do not end up with duplicate cases across the loop versions?

Overall it looks reasonable but I hope for some simplification still.  I also
think it's something for stage1 now :/

After addressing comments above can you put the patch on a branch
so I can play & fixup things a bit when I run into them?  Often it's easier
to just do a change instead of writing up a review comment and expecting
that to be point-on ;)

Thanks,
Richard.

>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> SPEC 2006 and 2017 survive running -O3 -march=native. There are 2090 
> optimization notes
> for SPEC 2006 (compared to 1945 before the patch).
>
> Thoughts?
> Thanks,
> Martin

Reply via email to