On Thu, Dec 9, 2021 at 2:02 PM Martin Liška <mli...@suse.cz> wrote: > > On 11/30/21 12:17, Richard Biener wrote: > > I'd like to see the gswitch support - that's what was posted before stage3 > > close, this patch on its own doesn't seem worth pushing for. That said, > > I have some comments below (and the already raised ones about how > > things might need to change with gswitch support). Is it so difficult to > > develop gswitch support as a separate change ontop of this? > > Hello. > > Took me some time, but I have a working version of the patch that makes both > refactoring of the costing model and adds support for gswitch. For quite some > time, I maintained 2 patches, but as commonly happens, I was forced doing > quite > some rework. If we really want a separation for bisection purpose, I suggest > simple > disabling of gswitch support?
It was really meant to ease review. I'm now looking at the combined patch, comments will follow. +static void +clean_up_after_unswitching (const auto_edge_flag &ignored_edge_flag) +{ + basic_block bb; + ... + /* Clean up the ignored_edge_flag from edges. */ + FOR_EACH_BB_FN (bb, cfun) + { + edge e; you can probably clean up outgoing edge flags in the loop above? (I'm randomly jumping) + /* Build compound expression for all cases leading + to this edge. */ + tree expr = NULL_TREE; + for (unsigned i = 1; i < gimple_switch_num_labels (stmt); ++i) + { + tree lab = gimple_switch_label (stmt, i); + basic_block dest = label_to_block (cfun, CASE_LABEL (lab)); + edge e2 = find_edge (gimple_bb (stmt), dest); + if (e == e2) just as a style note I prefer if (e != e2) continue; so the following code needs less indentation + tree cmp1 = build2 (GE_EXPR, boolean_type_node, idx, + CASE_LOW (lab)); is there a reason you are not using fold_build2? Do we want to somehow account for the built expression size or maybe have a separate limit for the number of cases we want to combine this way? + unswitch_predicate *predicate + = new unswitch_predicate (expr, idx, edge_index); + ranger->gori ().outgoing_edge_range_p (predicate->true_range, e, + idx, *get_global_range_query ()); + /* Huge switches are not supported by Ranger. */ + if (predicate->true_range.undefined_p ()) I hope ranger will set the range to varying_p () in that case, not undefined? But even then, is that a reason for punting? I guess we fail to prune cases in that case but the cost modeling should then account for those and thus we are at least consistent? + { + delete predicate; + return; In this context, when we do + if (operand_equal_p (predicate->lhs, last_predicate->lhs, 0)) + { + irange &other = (true_edge ? predicate->merged_true_range + : predicate->merged_false_range); + last_predicate->merged_true_range.intersect (other); + last_predicate->merged_false_range.intersect (other); + return; ranger may be conservative when intersecting (and hopefully not end up with undefined_p - heh). I also am confused about intersecting both merged_true_range and merged_false_range with 'other'? I would have expected to merge true edge info with true edge info and thus only "swap" things somehow? OTOH "path" suggests we're dealing with more than one edge and associated ranges? Maybe expanding the comment on the predicate_vector would be useful. AFAIR we there store the sequence of unswitchings done with pairs of the predicate unswitched on and a bool indicating whether we're dealing with the copy that had the predicate true or the one that had it false. + unswitch_predicate *predicate = NULL; + basic_block bb = NULL; + if (num > param_max_unswitch_level) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "Not unswitching anymore, hit max level\n"); + goto exit; + } I'll notice that given we have the overall size budget limiting the number of unswitchings itself is probably unnecessary (as noted above we might need to account for the size of the unswitch condition). + for (unsigned i = 0; i != loop->num_nodes; i++) + { + vec<unswitch_predicate *> &predicates = get_predicates_for_bb (bbs[i]); + if (!predicates.is_empty ()) + { same comment about indenting I wonder whether evaluate_control_stmt_using_entry_checks could set ignored_edge_flag itself instead of communicating via a hash_set? It's not exactly clear what we use pred->handled for, do we re-discover and re-try predicates when unswitching another level otherwise? But we _do_ want to unswitch the same switch stmt again, no? And since the BB predicate is keyed on the switch stmt that wouldn't work? I wonder whether on recursive invocations of tree_unswitch_single_loop we'd want to skip over unreachable BBs in the loop when looking for candidates (when we compute BB predicates we skip over them but that's done before all cases). Ah, the BB predicates are a vector, I wonder why we not simply remove the predicate from the BB predicate vector when we decide to unswitch on it and thus append it to a predicate path? Can you please amend the toplevel file comment as to the new flow of things? For the switch unswitching testcases I wonder if we can add dump scanning to verify we do not end up with duplicate cases across the loop versions? Overall it looks reasonable but I hope for some simplification still. I also think it's something for stage1 now :/ After addressing comments above can you put the patch on a branch so I can play & fixup things a bit when I run into them? Often it's easier to just do a change instead of writing up a review comment and expecting that to be point-on ;) Thanks, Richard. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > SPEC 2006 and 2017 survive running -O3 -march=native. There are 2090 > optimization notes > for SPEC 2006 (compared to 1945 before the patch). > > Thoughts? > Thanks, > Martin