On Tue, 27 Aug 2013, Xinliang David Li wrote: > Richard, I have some comments about the patch. > > > -ftree-vectorizer-verbose=<number> This switch is deprecated. Use > > -fopt-info instead. > > > > ftree-slp-vectorize > > ! Common Report Var(flag_tree_slp_vectorize) Optimization > > Enable basic block vectorization (SLP) on trees > > The code dealing with the interactions between -ftree-vectorize, O3, > etc are complicated and hard to understand. Is it better to change the > meaning of -ftree-vectorize to mean -floop-vectorize only, and make it > independent of -fslp-vectorize? P
Yeah, but that would be an independent change. Also people expect to be able to enable all of the vectorizer with -ftree-vectorize. So rather we introduce -floop-vectorize? > > + fvect-cost-model= > > + Common Joined RejectNegative Enum(vect_cost_model) > > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT) > > + Specifies the cost model for vectorization > > + > > + Enum > > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown > > vectorizer cost model %qs) > > + > > + EnumValue > > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) > > + > > + EnumValue > > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC) > > + > > + EnumValue > > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP) > > Introducing cheap model is a great change. > > > + > > > *** 173,179 **** > > { > > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > > > ! if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == > > 0) > > return false; > > > > if (dump_enabled_p ()) > > --- 173,180 ---- > > { > > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > > > ! if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP > > ! || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) > > == 0) > > return false; > > > > When the cost_model == cheap, the alignment peeling should also be > disabled -- there will still be loops that are beneficial to be > vectorized without peeling -- at perhaps reduced net runtime gain. IIRC there are targets that cannot vectorize unaligned accesses, so in the end the cost model needs to be more target-controlled. The above was just a start for experimenting, of course. > > struct gimple_opt_pass pass_slp_vectorize = > > --- 206,220 ---- > > static bool > > gate_vect_slp (void) > > { > > ! /* Apply SLP either according to whether the user specified whether to > > ! run SLP or not, or according to whether the user specified whether > > ! to do vectorization or not. */ > > ! if (global_options_set.x_flag_tree_slp_vectorize) > > ! return flag_tree_slp_vectorize != 0; > > ! if (global_options_set.x_flag_tree_vectorize) > > ! return flag_tree_vectorize != 0; > > ! /* And if vectorization was enabled by default run SLP only at -O3. */ > > ! return flag_tree_vectorize != 0 && optimize == 3; > > } > > The logic can be greatly simplified if slp vectorizer is controlled > independently -- easier for user to understand too. It should work with separating out -floop-vectorize, too I guess. But yes, as I wanted to preserve behavior of adding -ftree-vectorize to -O2 the above necessarily became quite complicated ;) > > ! @item -fvect-cost-model=@var{model} > > @opindex fvect-cost-model > > ! Alter the cost model used for vectorization. The @var{model} argument > > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}. > > ! With the @code{unlimited} model the vectorized code-path is assumed > > ! to be profitable while with the @code{dynamic} model a runtime check > > ! will guard the vectorized code-path to enable it only for iteration > > ! counts that will likely execute faster than when executing the original > > ! scalar loop. The @code{cheap} model will disable vectorization of > > ! loops where doing so would be cost prohibitive for example due to > > ! required runtime checks for data dependence or alignment but otherwise > > ! is equal to the @code{dynamic} model. > > ! The default cost model depends on other optimization flags and is > > ! either @code{dynamic} or @code{cheap}. > > > > Vectorizer in theory will only vectorize a loop with net runtime gain, > so the 'cost' here should only mean code size and compile time cost. Not exactly - for 'unlimited' we may enter the vectorized path even if the overhead of the guards, prologue and epilogue exceeds the benefit of the (eventually never entered) vectorized loop. That is, the 'dynamic' model does if (n > profitable-iters) { if (alias checks, align checks) { prologue loop vectorized loop epilogue loop } else goto scalar loop } else scalar loop because clearly the more complicated flow is not always profitable to enter. > Cheap Model: with this model, the compiler will vectorize loops that > are considered beneficial for runtime performance with minimal code > size increase and compile time cost > Unlimited Model: compiler will vectorize loops to maximize runtime > gain without considering compile time cost and impact to code size; ... and runtime speed But you are right - changing the wording to tell what it will vectorize as opposed to what not would be an improvement. Richard.