On Mon, 18 Jun 2012, William J. Schmidt wrote: > On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote: > > On Fri, 8 Jun 2012, William J. Schmidt wrote: > > > <snip> > > > > Hmm. I don't like this patch or its general idea too much. Instead > > I'd like us to move more of the cost model detail to the target, giving > > it a chance to look at the whole loop before deciding on a cost. ISTR > > posting the overall idea at some point, but let me repeat it here instead > > of trying to find that e-mail. > > > > The basic interface of the cost model should be, in targetm.vectorize > > > > /* Tell the target to start cost analysis of a loop or a basic-block > > (if the loop argument is NULL). Returns an opaque pointer to > > target-private data. */ > > void *init_cost (struct loop *loop); > > > > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */ > > void add_stmt_cost (void *data, unsigned n, > > vectorized-stmt-kind, > > enum machine_mode vector_mode); > > > > /* Tell the target to compute and return the cost of the accumulated > > statements and free any target-private data. */ > > unsigned finish_cost (void *data); > > > > with eventually slightly different signatures for add_stmt_cost > > (like pass in the original scalar stmt?). > > > > It allows the target, at finish_cost time, to evaluate things like > > register pressure and resource utilization. > > > > Thanks, > > Richard. > > I've been looking at this in between other projects. I wanted to be > sure I understood the SLP infrastructure and whether it would cause any > problems. It looks to me like it will be mostly ok. One issue I > noticed is a possible difference in the order in which SLP instructions > are analyzed and the order in which the instructions are "issued" during > transformation. > > For both loop analysis and basic block analysis, SLP trees are > constructed and analyzed prior to examining other vectorizable > instructions. Their costs are calculated and stored in the SLP trees at > this time. Later, when transforming statements to their vector > equivalents, instructions in the block (or loop body) are processed in > order until the first instruction that's part of an SLP tree is > encountered. At that point, every instruction that's part of any SLP > tree is transformed; then the vectorizer continues with the remaining > non-SLP vectorizable statements. > > So if we do the natural and easy thing of placing calls to add_stmt_cost > everywhere that costs are calculated today, the order that those costs > are presented to the back end model will possibly be different than the > order they are actually "emitted."
Interesting. But I suppose this is similar to how pattern statements are handled? Thus, the whole pattern sequence is processed when we encounter the "main" pattern statement? > For a first cut at this, I suggest ignoring the problem other than to > document it as an opportunity for improvement. Later we could improve > it by using an add_stmt_slp_cost () interface (or adding an is_slp > flag), and another interface to be called at the time during analysis > when the SLP statements will be issued during transformation. This > would allow the back end model to queue up the SLP costs in a separate > vector and later place them in its internal structures at the > appropriate place. > > It should eventually be possible to remove these fields/accessors: > > * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST > * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST > * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST > > However, I think this should be delayed until we have the basic > infrastructure in place for the new model and well-tested. Indeed. > The other issue is that we should have the model track both the inside > and outside costs if we're going to get everything into the target > model. For a first pass we can ignore this and keep the existing logic > for the outside costs. Later we should add some interfaces analogous to > add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so > the model can track this stuff as carefully as it wants to. Outside costs are merely added to the niter * inner-cost metric to be compared with the scalar cost niter * scalar-cost, right? Thus they would be tracked completely separate - eventually similar to how we compute the cost of the scalar loop. > So, I'd propose going at this in several phases: > > (1) Add calls to the new interface without disturbing existing logic; > modify the profitability algorithms to query the new model for inside > costs. Default algorithm for the model is to just sum costs as is done > today. Right. > (x) Add heuristics to target models as desired. > (2) Handle the SLP ordering problem. > (3) Handle outside costs in the target model. > (4) Remove the now unnecessary cost fields and the calls that set them. > > Item (x) can happen anytime after item (1). > > I don't think this work is terribly difficult, just a bit tedious. The > only really time-consuming aspect of it will be in very careful testing > to keep from changing existing behavior. > > All comments welcome -- please let me know what you think. I think this is a reasonable plan. Thanks, Richard.