Hello, I faced a similar issue a while ago. I filed a bug report (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712) In the end, I implemented a simple tree-level unrolling pass in our port which uses all the existing infrastructure. It works quite well for our purpose, but I hesitated to submit the patch because it contains our not-very-elegannt #prgama unroll implementation.
The following two functions do the unrolling. I insert the pass just after the loop_prefetch pass (4.4) Cheers, Bingfeng Mei /* Perfect unrolling of a loop */ static void tree_unroll_perfect_loop (struct loop *loop, unsigned factor, edge exit) { sbitmap wont_exit; edge e; bool ok; unsigned i; VEC (edge, heap) *to_remove = NULL; /* Unroll the loop and remove the exits in all iterations except for the last one. */ wont_exit = sbitmap_alloc (factor); sbitmap_ones (wont_exit); RESET_BIT (wont_exit, factor - 1); ok = gimple_duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), factor - 1, wont_exit, exit, &to_remove, DLTHE_FLAG_UPDATE_FREQ); free (wont_exit); gcc_assert (ok); for (i = 0; VEC_iterate (edge, to_remove, i, e); i++) { ok = remove_path (e); gcc_assert (ok); } VEC_free (edge, heap, to_remove); update_ssa (TODO_update_ssa); #ifdef ENABLE_CHECKING verify_flow_info (); verify_dominators (CDI_DOMINATORS); verify_loop_structure (); verify_loop_closed_ssa (); #endif } /* Go through all the loops: 1. Determine unrolling factor 2. Unroll loops in different conditions -- perfect loop: no extra copy of original loop -- other loops: the original version of loops to execute the remaining iterations */ static unsigned int rest_of_tree_unroll (void) { loop_iterator li; struct loop *loop; unsigned ninsns, unroll_factor; HOST_WIDE_INT est_niter; struct tree_niter_desc desc; bool unrolled = false; initialize_original_copy_tables (); /* Scan the loops, inner ones first. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { est_niter = estimated_loop_iterations_int (loop, false); ninsns = tree_num_loop_insns (loop, &eni_size_weights); unroll_factor = determine_unroll_factor (loop, ninsns, &desc, est_niter); if (unroll_factor != 1) { tree niters = number_of_exit_cond_executions(loop); bool perfect_unrolling = false; if(niters != NULL_TREE && niters!= chrec_dont_know && TREE_CODE(niters) == INTEGER_CST){ int num_iters = tree_low_cst(niters, 1); if((num_iters % unroll_factor) == 0) perfect_unrolling = true; } /* If no. of iterations can be divided by unrolling factor, we have perfect unrolling */ if(perfect_unrolling){ tree_unroll_perfect_loop(loop, unroll_factor, single_dom_exit(loop)); } else{ tree_unroll_loop (loop, unroll_factor, single_dom_exit (loop), &desc); } unrolled = true; } } free_original_copy_tables (); /* Need to rebuild call graph due if new function calls are created due to loop unrolling FIXME: rebuild cgraph will lose some information like reason of not inlining*/ if(unrolled) rebuild_cgraph_edges(); /*debug_cgraph();*/ return 0; } > -----Original Message----- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On > Behalf Of Jean Christophe Beyler > Sent: 14 October 2009 19:29 > To: Zdenek Dvorak > Cc: gcc@gcc.gnu.org > Subject: Re: Turning off unrolling to certain loops > > Ok, I've actually gone a different route. Instead of waiting for the > middle end to perform this, I've directly modified the parser stage to > unroll the loop directly there. > > Basically, I take the parser of the for and modify how it adds the > various statements. Telling it to, instead of doing in the > c_finish_loop : > > if (body) > add_stmt (body); > if (clab) > add_stmt (build1 (LABEL_EXPR, void_type_node, clab)); > if (incr) > add_stmt (incr); > ... > > I tell it to add multiple copies of body and incr and the at the end > add in the loop the rest of it. I've also added support to remove > further unrolling to these modified loops and will be handling the > "No-unroll" pragma. I then let the rest of the optimization passes, > fuse the incrementations together if possible, etc. > > The initial results are quite good and seem to work and > produce good code. > > Currently, there are two possibilities : > > - If the loop is not in the form we want, for example: > > for (;i<n;) > { > ... > } > > Do we still unroll even though we have to trust the user that the > number of unrolling will not break the semantics ? > > To handle this, I am adding warnings that will appear if the loop is > anything but : > > for (i=C1; i < C2; i ++) > { > ... > } > > Later on, once this is thoroughly tested, I will allow : > > for (i=C1; fct (i, C2); i = fct2 (i)) > > > where fct is any comparison function with only i and C2, > fct2 is a incrementation/decrementation calculation using i. > > > Any comments ? Concerns ? Questions ? > Thanks in advance, > Jc > > On Thu, Oct 8, 2009 at 12:22 PM, Jean Christophe Beyler > <jean.christophe.bey...@gmail.com> wrote: > > Hi, > > > >> such an epilogue is needed when the # of iterations is not > known in the > >> compile time; it should be fairly easy to modify the > unrolling not to > >> emit it when it is not necessary, > > > > Agreed, that is why I was surprised to see this in my > simple example. > > It seems to me that the whole unrolling process has been made to, on > > purpose, have this epilogue in place. > > > > In the case where the unrolling would be perfect (ie. there would be > > no epilogue), the calculation of the max bound of the > unrolled version > > is always done to have this epilogue (if you have 4 > iterations and ask > > to unroll twice, it will actually change the max bound to 3, > > therefore, having one iteration of the unrolled version and 2 > > iterations of the original...). I am currently looking at > the code of > > tree_transform_and_unroll_loop to figure out how to change this and > > not have an epilogue in my cases. > > > > Jc > > > >