On Thu, 4 Oct 2012, Jan Hubicka wrote: > > > So SOC cancels out in the runtime check. > > > I still think we need two formulas - one determining if vectorization is > > > profitable, other specifying the threshold for scalar path at runtime > > > (that > > > will generally give lower values). > > > > True, we want two values. But part of the scalar path right now > > is all the computation required for alias and alignment runtime checks > > (because the way all the conditions are combined). > > > > I'm not much into the details of what we account for in SOC (I suppose > > it's everything we insert in the preheader of the vector loop). > > Yes, it seems contain everything we insert prior the loop in unfolded form. > > > > + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > > + fprintf (vect_dump, "not vectorized: estimated iteration count > > too small."); > > + if (vect_print_dump_info (REPORT_DETAILS)) > > + fprintf (vect_dump, "not vectorized: estimated iteration count > > smaller than " > > + "user specified loop bound parameter or minimum " > > + "profitable iterations (whichever is more > > conservative)."); > > > > this won't work anymore btw - dumping infrastructure changed. > > Ah, will update that. > > > > I suppose your patch is a step in the right direction, but to really > > make progress we need to re-organize the loop and predicate structure > > produced by the vectorizer. > > This reminds me what I did for string functions on x86. It gets very hard > to get all the paths right when one starts to be really cureful to not > output too much cruft on the short paths + do not consume too many registers. > > In fact I want to re-think this for the SSE string ops patch, so I may try to > look into that incrementally. > > > > So, please update your patch, re-test and then it's ok. > > Thanks. > > > I tested enabling loop_ch in early passes with -fprofile-feedback and it > > > is SPEC > > > neutral. Given that it improves loop count estimates, I would still like > > > mainline > > > doing that. I do not like these quite important estimates to be wrong > > > most of time. > > > > I agree. It also helps getting rid of once rolling loops I think. > > I am attaching the patch for early-ch. Will commit it tomorrow. > > Concerning jump threading, it would help to make some of it during early > passes > so the profile estiamte do not get invalided. I tried to move VRP early but > now it > makes compiler to hang during bootstrap. I will debug that. > > > > > > > > > > Btw, I added a "similar" check in vect_analyze_loop_operations: > > > > > > > > if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > > > && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor)) > > > > || ((max_niter = max_stmt_executions_int (loop)) != -1 > > > > && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor)) > > > > { > > > > if (dump_kind_p (MSG_MISSED_OPTIMIZATION)) > > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > "not vectorized: iteration count too small."); > > > > if (dump_kind_p (MSG_MISSED_OPTIMIZATION)) > > > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > > "not vectorized: iteration count smaller than " > > > > "vectorization factor."); > > > > return false; > > > > } > > > > > > > > maybe you simply need to update that to also consider the profile? > > > > > > Hmm, I am still getting familiar wth the code. Later we later have > > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > > > && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th) > > > { > > > if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > > > fprintf (vect_dump, "not vectorized: vectorization not " > > > "profitable."); > > > if (vect_print_dump_info (REPORT_DETAILS)) > > > fprintf (vect_dump, "not vectorized: iteration count smaller than > > > " > > > "user specified loop bound parameter or minimum " > > > "profitable iterations (whichever is more > > > conservative)."); > > > return false; > > > } > > > > > > where th is always greater or equal than vectorization_factor from the > > > cost model. > > > So this test seems redundant if the max_stmt_executions_int was pushed > > > down > > > to the second conditoinal? > > > > Yes, sort of. The new check was supposed to be crystal clear, and > > even with the cost model disabled we want to not vectorize in this > > case. But yes, the whole cost-model stuff needs TLC. > > Ah yes, without cost model we would skip it. I suppose we do not need to > brother witht he profile estiamte in the case anyway. They are kind of aprt > of > the cost models. > > * passes.c (init_optimization_passes): Schedule early CH. > * tree-pass.h (pass_early_ch): Declare it. > * tree-ssa-loop-ch.c (gate_early_ch): New function. > (pass_early_ch): New pass.
Instead of an extra ch pass please move the existing one. And benchmark that, of course. Richard. > Index: passes.c > =================================================================== > --- passes.c (revision 191852) > +++ passes.c (working copy) > @@ -1335,6 +1336,7 @@ init_optimization_passes (void) > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_profile); > NEXT_PASS (pass_local_pure_const); > + NEXT_PASS (pass_early_ch); > /* Split functions creates parts that are not run through > early optimizations again. It is thus good idea to do this > late. */ > Index: tree-pass.h > =================================================================== > --- tree-pass.h (revision 191852) > +++ tree-pass.h (working copy) > @@ -291,6 +291,7 @@ extern struct gimple_opt_pass pass_loop_ > extern struct gimple_opt_pass pass_iv_optimize; > extern struct gimple_opt_pass pass_tree_loop_done; > extern struct gimple_opt_pass pass_ch; > +extern struct gimple_opt_pass pass_early_ch; > extern struct gimple_opt_pass pass_ccp; > extern struct gimple_opt_pass pass_phi_only_cprop; > extern struct gimple_opt_pass pass_build_ssa; > Index: tree-ssa-loop-ch.c > =================================================================== > --- tree-ssa-loop-ch.c (revision 191852) > +++ tree-ssa-loop-ch.c (working copy) > @@ -275,3 +275,34 @@ struct gimple_opt_pass pass_ch = > | TODO_verify_flow /* todo_flags_finish */ > } > }; > + > +/* We duplicate loop headers early to get better profile feedback: > + the first iteration of loop is special, because the duplicated loop header > + test will usually pass. */ > + > +static bool > +gate_early_ch (void) > +{ > + return flag_tree_ch != 0 && (flag_branch_probabilities || > profile_arc_flag); > +} > + > +struct gimple_opt_pass pass_early_ch = > +{ > + { > + GIMPLE_PASS, > + "early_ch", /* name */ > + gate_early_ch, /* gate */ > + copy_loop_headers, /* execute */ > + NULL, /* sub */ > + NULL, /* next */ > + 0, /* static_pass_number */ > + TV_TREE_CH, /* tv_id */ > + PROP_cfg | PROP_ssa, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_cleanup_cfg > + | TODO_verify_ssa > + | TODO_verify_flow /* todo_flags_finish */ > + } > +}; > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend