Bernd Schmidt <ber...@codesourcery.com> writes: >> Is there no possibility of running the optimisation in cfglayout mode >> instead? It seems from this and the forwarder block stuff above as >> though it might make things easier, but maybe not. > > I'm not sure what you mean here. This reordering isn't for the sake of > the optimization code; it's necessary due to the encoding of the > Blackfin LSETUP instruction which assumes that the loop start label is > before the loop end label. > > The optimizing functions definitely need to see RTL that corresponds > directly to generated assembly, since they need to count instruction > sizes. So I think cfglayout is out for that.
Yeah, good point. >>> + /* Every loop contains in its list of inner loops every loop nested >>> inside >>> + it, even if there are intermediate loops. This works because we're >>> doing >>> + a depth-first search here and never visit a loop more than once. */ >>> + for (ix = 0; VEC_iterate (loop_info, loop->loops, ix, inner); ix++) >>> + { >>> + optimize_loop (inner, hooks); >> >> Perhaps use a worklist here, to avoid O(loops)-deep recursion? > > Well, it's O(counting loop nesting level) deep, which I think isn't much > of a problem in practice. One could even argue that it's a O(1) since > the register allocator will eventually run out of loop iteration registers. Ah yeah, I hadn't thought about FIRST_PSEUDO_REGISTERS being an effective cap on it. I think it'd be worth adding a comment along the lines of: Note that the depth of recursion is limited by FIRST_PSEUDO_REGISTER, because loops are tied directly to an iterator register. above the inner optimize_loops calls (or whereever you think is best), just to emphasise why recursion isn't a problem. > Thanks for the review - new patch below. I think this addresses > everything. Yeah, looks good. Thanks especially for all the extra commentary, both in the "headline" places and elsewhere. It really helps clarify what's going on. > * hw-doloop.c: New file. > * hw-doloop.h: New file. > * Makefile.in (OBJS): Add hw-doloop.o. > (hw-doloop.o): New rule. > ($(obj_out_file)): Add hw-doloop.h dependency. > * config/bfin/bfin.c: Include "hw-doloop.h". > (loop_info, DEF_VEC_P for loop_info, loop_info_d): Remove. > (bfin_dump_loops, bfin_bb_in_loop, bfin_scan_loop): Remove. > (hwloop_optimize): Renamed from bfin_optimize_loop. Argument > type changed to hwloop_info. Return bool, true if the loop was > successfully optimized. Remove code that was moved to > hw-doloop.c, and adjust other parts. > (hwloop_fail): New static function, containing parts that used > to be in bfin_optimize_loop. > (bfin_discover_loop, bfin_discover_loops, free_loops, > bfin_reorder_loops): Remove. > (hwloop_pattern_reg): New static function. > (bfin_doloop_hooks): New variable. > (bfin_reorg_loops): Remove most code, call reorg_loops. > * config/bfin/bfin.md (doloop_end splitter): Also enable if > loop counter is a memory_operand. The target-independent bits are OK with me with the changes below. Obviously everything else is in your port anyway. > +/* Try to identify a forwarder blocks that jump into LOOP, and add it to s/blocks/block/ > +#define BB_AUX_INDEX(BB) ((intptr_t)(BB)->aux) Missing space before (BB). > +/* Initialize the aux fields to give ascending indices to basic blocks. */ > +static void > +set_bb_indices (void) > +{ > + basic_block bb; > + unsigned index; > + > + index = 0; > + FOR_EACH_BB (bb) > + bb->aux = (PTR)index++; s/(PTR)/(void *) /. (I always associate PTR with the old K&R stuff.) More importantly, "index" needs to be intptr_t in order to avoid warnings. (Could you bootstrap this on x86_64 to check for things like that? A C bootstrap only should be fine of course, since the code isn't going to be run.) > + hwloop_info loops = NULL; Unnecessary initialisation (or at least, it should be). Richard