Steven, Thanks for the feedback. I've committed the original patch as-is, but I'm happy to improve it with follow up patches.
On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote: > On Wed, Nov 20, 2013 at 8:29 PM, Oleg Endo wrote: > > > > The attached patch adds another SH specific RTL pass which is supposed > > to optimize clrt and sett instruction usage. As a start, it currently > > just eliminates redundant clrt and sett instructions in cases where the > > T bit value is known. However, I'm planning on extending it a little in > > order to e.g. hoist clrt/sett insns out of loops etc. > > +#include <vector> Is there something wrong with <vector> ? > > +#define log_msg(...)\ > > + do { if (dump_file != NULL) fprintf (dump_file, __VA_ARGS__); } while (0) > > Is that valid C++98, a varags macro? No it's not. It's C99 / C++11. However, most compilers support __VA_ARGS__. If it causes too much trouble I'll remove it of course (sh_treg_combine.cc would also be affected). Having to write "if (dump_file ...)" stuff over and over again is annoying and impacts the readability of code in my opinion. So if the __VA_ARGS__ usage has to go, there should be some more or less alternative. I think other passes would also benefit from that. BTW something std::ostream like would be nice for logging, like... log_msg ("updated ccreg mode: "); log_rtx (m_ccreg); log_msg ("\n\n"); ... could be something like log (<< "updated ccreg mode: " << m_ccreg << "\n\n"); where log is: #define log(expr)\ do { if (dump_file != NULL) <logging ostream ref> expr; } while (0) Since there are various ways of printing an rtx (print_rtl, print_rtl_single), this could be done with a wrapper object around the logged rtx: log (<< "got insn: \n" << rtx::log_single (my_insn) << "\n"); or iomanip style (requires custom ostream): log (<< "got insn: \n" << rtx::log_single << my_insn << "\n"); ... but I'm not sure how others think about that. If this is of interest I could do some work in that direction. > > + // Notice that the CFG might be invalid at late RTL stages and > > + // BLOCK_FOR_INSN might return null. Thus the basic block are recorded > > + // here while traversing them. > > + basic_block bb; > > You insert your pass just before sched2. The CFG is perfectly fine at > that stage. So you shouldn't need this. (And if you would need to > record bb, then this solution wouldn't be GC-safe). Why is that? AFAIK GC is done after the pass has finished? The pass doesn't keep any state beyond the current pass execution. > > BLOCK_FOR_INSN will only be NULL for things that are not inside a > basic block (some notes, deleted labels, etc.). > > That all said and done: AFAICT you don't actually use BLOCK_FOR_INSN > anywhere :-) Sorry for the confusion. Initially I had it inserted right before machine dependent reorg, at which point I was getting nullptr from BLOCK_FOR_INSN all over the place. So not using it was the fix. However, only after I've noticed that SH's machine dependent reorg leaves the basic block structure in a 'broken' state (PR 59189) and thus I moved the pass before sched2. I'll try using BLOCK_FOR_INSN again. > > > + for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei); ei_next > > (&ei)) > > FOR_EACH_EDGE > > Declaring the edge_iterator inside the for() is not a good argument > against FOR_EACH_EDGE. But that was the only reason! ;) (I've done the same in sh_treg_combine.cc) Can we ... > Of course, brownie points are up for grabs for > the brave soul daring enough to make edge iterators be proper C++ > iterators... ;-) ... fix this first and use the SH RTL passes as examples. Then migrate other existing code? Or do you insist on FOR_EACH_EDGE usage for the time being? > > + if (pred_bb->index == ENTRY_BLOCK) > > I used to dislike this idiom of checking bb->index against fixed block > numbers. But now that ENTRY_BLOCK_PTR and EXIT_BLOCK_PTR actually > require two pointer dereferences (cfun->cfg->...) it's the better > option. > > /me adds to TODO list... Actually I don't need any of that. I will re-test the following (although it's quite obvious): Index: gcc/config/sh/sh_optimize_sett_clrt.cc =================================================================== --- gcc/config/sh/sh_optimize_sett_clrt.cc (revision 205191) +++ gcc/config/sh/sh_optimize_sett_clrt.cc (working copy) @@ -309,9 +309,6 @@ std::vector<ccreg_value>& values_out, basic_block prev_visited_bb) const { - if (start_insn == NULL_RTX) - return; - log_msg ("looking for ccreg values in [bb %d]\n", bb->index); for (rtx i = start_insn; i != NULL_RTX && i != PREV_INSN (BB_HEAD (bb)); @@ -376,9 +373,6 @@ for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei); ei_next (&ei)) { basic_block pred_bb = ei_edge (ei)->src; - if (pred_bb->index == ENTRY_BLOCK) - continue; - pred_bb_count += 1; find_last_ccreg_values (BB_END (pred_bb), pred_bb, values_out, bb); } > > > +::remove_ccreg_dead_unused_notes (std::vector<ccreg_value>& values) const > > Maybe put a generic version of this in rtlanal.c, next to > remove_reg_equal_equiv_notes_for_regno? Yes. In fact I wanted to propose moving some of the helper functions from sh_treg_combine.cc and sh_optimize_sett_clrt.cc to rtl.h / rtlanal.c at some point in time. > On the whole: The pass will have worst-case run time quadratic in the > number of basic blocks, due to the recursion in > find_last_ccreg_values. You'll probably want to limit the depth of the > iteration somehow (hard-coded depth, common dominator, first > post-dominated block, whatever...) or sooner-or-later you'll have a PR > assigned to you for a test case that makes compile time explode in > this pass... A PR with a test case would be nice. In the worst case it'll have to be an iteration limit, but I can also imagine that caching results (hash_table) for bb subtrees could be useful here, couldn't it? Cheers, Oleg