Re: [RFC] Make vectorizer to skip loops with small iteration estimate

Richard Guenther Thu, 04 Oct 2012 09:20:01 -0700

On Thu, 4 Oct 2012, Jan Hubicka wrote:

> > > So SOC cancels out in the runtime check.
> > > I still think we need two formulas - one determining if vectorization is
> > > profitable, other specifying the threshold for scalar path at runtime 
> > > (that
> > > will generally give lower values).
> > 
> > True, we want two values.  But part of the scalar path right now
> > is all the computation required for alias and alignment runtime checks
> > (because the way all the conditions are combined).
> > 
> > I'm not much into the details of what we account for in SOC (I suppose
> > it's everything we insert in the preheader of the vector loop).
> 
> Yes, it seems contain everything we insert prior the loop in unfolded form.
> > 
> > +      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> > too small.");
> > +      if (vect_print_dump_info (REPORT_DETAILS))
> > +        fprintf (vect_dump, "not vectorized: estimated iteration count 
> > smaller than "
> > +                 "user specified loop bound parameter or minimum "
> > +                 "profitable iterations (whichever is more 
> > conservative).");
> > 
> > this won't work anymore btw - dumping infrastructure changed.
> 
> Ah, will update that.
> > 
> > I suppose your patch is a step in the right direction, but to really
> > make progress we need to re-organize the loop and predicate structure
> > produced by the vectorizer.
> 
> This reminds me what I did for string functions on x86. It gets very hard
> to get all the paths right when one starts to be really cureful to not
> output too much cruft on the short paths + do not consume too many registers.
> 
> In fact I want to re-think this for the SSE string ops patch, so I may try to
> look into that incrementally.
> > 
> > So, please update your patch, re-test and then it's ok.
> 
> Thanks.
> > > I tested enabling loop_ch in early passes with -fprofile-feedback and it 
> > > is SPEC
> > > neutral.  Given that it improves loop count estimates, I would still like 
> > > mainline
> > > doing that.  I do not like these quite important estimates to be wrong 
> > > most of time.
> > 
> > I agree.  It also helps getting rid of once rolling loops I think.
> 
> I am attaching the patch for early-ch.  Will commit it tomorrow.
> 
> Concerning jump threading, it would help to make some of it during early 
> passes
> so the profile estiamte do not get invalided.  I tried to move VRP early but 
> now it
> makes compiler to hang during bootstrap.  I will debug that.
> > 
> > > > 
> > > > Btw, I added a "similar" check in vect_analyze_loop_operations:
> > > > 
> > > >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> > > >       || ((max_niter = max_stmt_executions_int (loop)) != -1
> > > >           && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > > >     {
> > > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >                          "not vectorized: iteration count too small.");
> > > >       if (dump_kind_p (MSG_MISSED_OPTIMIZATION))
> > > >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > >                          "not vectorized: iteration count smaller than "
> > > >                          "vectorization factor.");
> > > >       return false;
> > > >     }
> > > > 
> > > > maybe you simply need to update that to also consider the profile?
> > > 
> > > Hmm, I am still getting familiar wth the code. Later we later have
> > >   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >       && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> > >     {
> > >       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
> > >         fprintf (vect_dump, "not vectorized: vectorization not "
> > >                  "profitable.");
> > >       if (vect_print_dump_info (REPORT_DETAILS))
> > >         fprintf (vect_dump, "not vectorized: iteration count smaller than 
> > > "
> > >                  "user specified loop bound parameter or minimum "
> > >                  "profitable iterations (whichever is more 
> > > conservative).");
> > >       return false;
> > >     }
> > > 
> > > where th is always greater or equal than vectorization_factor from the 
> > > cost model.
> > > So this test seems redundant if the max_stmt_executions_int was pushed 
> > > down
> > > to the second conditoinal?
> > 
> > Yes, sort of.  The new check was supposed to be crystal clear, and
> > even with the cost model disabled we want to not vectorize in this
> > case.  But yes, the whole cost-model stuff needs TLC.
> 
> Ah yes, without cost model we would skip it.  I suppose we do not need to
> brother  witht he profile estiamte in the case anyway. They are kind of aprt 
> of
> the cost models.
> 
>       * passes.c (init_optimization_passes): Schedule early CH.
>       * tree-pass.h (pass_early_ch): Declare it.
>       * tree-ssa-loop-ch.c (gate_early_ch): New function.
>       (pass_early_ch): New pass.


Instead of an extra ch pass please move the existing one.  And
benchmark that, of course.

Richard.

> Index: passes.c
> ===================================================================
> --- passes.c  (revision 191852)
> +++ passes.c  (working copy)
> @@ -1335,6 +1336,7 @@ init_optimization_passes (void)
>            NEXT_PASS (pass_cleanup_eh);
>            NEXT_PASS (pass_profile);
>            NEXT_PASS (pass_local_pure_const);
> +          NEXT_PASS (pass_early_ch);
>         /* Split functions creates parts that are not run through
>            early optimizations again.  It is thus good idea to do this
>            late.  */
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h       (revision 191852)
> +++ tree-pass.h       (working copy)
> @@ -291,6 +291,7 @@ extern struct gimple_opt_pass pass_loop_
>  extern struct gimple_opt_pass pass_iv_optimize;
>  extern struct gimple_opt_pass pass_tree_loop_done;
>  extern struct gimple_opt_pass pass_ch;
> +extern struct gimple_opt_pass pass_early_ch;
>  extern struct gimple_opt_pass pass_ccp;
>  extern struct gimple_opt_pass pass_phi_only_cprop;
>  extern struct gimple_opt_pass pass_build_ssa;
> Index: tree-ssa-loop-ch.c
> ===================================================================
> --- tree-ssa-loop-ch.c        (revision 191852)
> +++ tree-ssa-loop-ch.c        (working copy)
> @@ -275,3 +275,34 @@ struct gimple_opt_pass pass_ch =
>      | TODO_verify_flow                       /* todo_flags_finish */
>   }
>  };
> +
> +/* We duplicate loop headers early to get better profile feedback:
> +   the first iteration of loop is special, because the duplicated loop header
> +   test will usually pass.  */
> +
> +static bool
> +gate_early_ch (void)
> +{
> +  return flag_tree_ch != 0 && (flag_branch_probabilities || 
> profile_arc_flag);
> +}
> +
> +struct gimple_opt_pass pass_early_ch =
> +{
> + {
> +  GIMPLE_PASS,
> +  "early_ch",                                /* name */
> +  gate_early_ch,                     /* gate */
> +  copy_loop_headers,                 /* execute */
> +  NULL,                                      /* sub */
> +  NULL,                                      /* next */
> +  0,                                 /* static_pass_number */
> +  TV_TREE_CH,                                /* tv_id */
> +  PROP_cfg | PROP_ssa,                       /* properties_required */
> +  0,                                 /* properties_provided */
> +  0,                                 /* properties_destroyed */
> +  0,                                 /* todo_flags_start */
> +  TODO_cleanup_cfg
> +    | TODO_verify_ssa
> +    | TODO_verify_flow                       /* todo_flags_finish */
> + }
> +};
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

Reply via email to