On Tue, May 11, 2021 at 9:10 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> on 2021/5/10 下午9:55, Richard Biener wrote:
> > On Sat, May 8, 2021 at 10:05 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi Richi,
> >>
> >> Thanks for the comments!
> >>
> >> on 2021/5/7 下午5:43, Richard Biener wrote:
> >>> On Fri, May 7, 2021 at 5:30 AM Kewen.Lin via Gcc-patches
> >>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> When I was investigating density_test heuristics, I noticed that
> >>>> the current rs6000_density_test could be used for single scalar
> >>>> iteration cost calculation, through the call trace:
> >>>>   vect_compute_single_scalar_iteration_cost
> >>>>     -> rs6000_finish_cost
> >>>>          -> rs6000_density_test
> >>>>
> >>>> It looks unexpected as its desriptive function comments and Bill
> >>>> helped to confirm this needs to be fixed (thanks!).
> >>>>
> >>>> So this patch is to check the passed data, if it's the same as
> >>>> the one in loop_vinfo, it indicates it's working on vector version
> >>>> cost calculation, otherwise just early return.
> >>>>
> >>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> >>>>
> >>>> Nothing remarkable was observed with SPEC2017 Power9 full run.
> >>>>
> >>>> Is it ok for trunk?
> >>>
> >>> +  /* Only care about cost of vector version, so exclude scalar
> >>> version here.  */
> >>> +  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
> >>> +    return;
> >>>
> >>> Hmm, looks like a quite "random" test to me.  What about adding a
> >>> parameter to finish_cost () (or init_cost?) indicating the cost kind?
> >>>
> >>
> >> I originally wanted to change the hook interface, but noticed that
> >> the finish_cost in function vect_estimate_min_profitable_iters is
> >> the only invocation with LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> >> it looks enough to differentiate the scalar version costing or
> >> vector version costing for loop.  Do you mean this observation/
> >> assumption easy to be broken sometime later?
> >
> > Yes, this field is likely to become stale.
> >
> >>
> >> The attached patch to add one new parameter to indicate the
> >> costing kind explicitly as you suggested.
> >>
> >> Does it look better?
> >>
> >> gcc/ChangeLog:
> >>
> >>         * doc/tm.texi: Regenerated.
> >>         * target.def (init_cost): Add new parameter costing_for_scalar.
> >>         * targhooks.c (default_init_cost): Adjust for new parameter.
> >>         * targhooks.h (default_init_cost): Likewise.
> >>         * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise.
> >>         (vect_compute_single_scalar_iteration_cost): Likewise.
> >>         (vect_analyze_loop_2): Likewise.
> >>         * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
> >>         (vect_bb_vectorization_profitable_p): Likewise.
> >>         * tree-vectorizer.h (init_cost): Likewise.
> >>         * config/aarch64/aarch64.c (aarch64_init_cost): Likewise.
> >>         * config/i386/i386.c (ix86_init_cost): Likewise.
> >>         * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
> >>
> >>> OTOH we already pass scalar_stmt to individual add_stmt_cost,
> >>> so not sure whether the context really matters.  That said,
> >>> the density test looks "interesting" ... the intent was that finish_cost
> >>> might look at gathered data from add_stmt, not that it looks at
> >>> the GIMPLE IL ... so why are you not counting vector_stmt vs.
> >>> scalar_stmt entries in vect_body and using that for this metric?
> >>>
> >>
> >> Good to know the intention behind finish_cost, thanks!
> >>
> >> I'm afraid that the check on vector_stmt and scalar_stmt entries
> >> from add_stmt_cost doesn't work for the density test here.  The
> >> density test focuses on the vector version itself, there are some
> >> stmts whose relevants are marked as vect_unused_in_scope, IIUC
> >> they won't be passed down when costing for both versions.  But the
> >> existing density check would like to know the cost for the
> >> non-vectorized part.  The current implementation does:
> >>
> >>  vec_cost = data->cost[vect_body]
> >>
> >>           if (!STMT_VINFO_RELEVANT_P (stmt_info)
> >>               && !STMT_VINFO_IN_PATTERN_P (stmt_info))
> >>             not_vec_cost++
> >>
> >>  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
> >>
> >> it takes those unrelevant stmts into account, and then has
> >> both costs from the non-vectorized part (not_vec_cost)
> >> and vectorized part (cost[vect_body]), it can calculate the
> >> vectorization code density ratio.
> >
> > Yes, but then what "relevant" stmts are actually needed and what
> > not is missed by your heuristics.  It's really some GIGO one
> > I fear - each vectorized data reference will add a pointer IV
> > (eventually commoned by IVOPTs later) and pointer value updates
> > that are not accounted for in costing (the IVs and updates in the
> > scalar code are marked as not relevant).  Are those the stmts
> > this heuristic wants to look at?
>
> Yes, the IVs and updates (even the comparison for exit) are what
> the heuristics tries to count.  In most cases, the non vectorized
> part in the loop are IV updates.  And it's so true that the
> collected not_vec_cost could be not accurate, but it seems hard
> to predict the cost exactly here?
>
> Assuming this not_vect_cost cost is over priced, it could result
> in a lower density ratio than what it should be.  Also assuming
> the density threshold is relatively conservative, in this case
> if the ratio still exceeds the density threshold, we can say the
> loop is really dense.  It could miss to catch some "dense" loops,
> but I hope it won't take "non-dense" loops as "dense" unexpectedly.

So we could in principle include IVs and updates in the costing but
then the vectorizer isn't absolutely careful for doing scalar cleanups
and instead expects IVOPTs to create canonical IVs.  Note for
the scalar part those stmts are not costed either, we'd have to
change that as well.  What this would mean is that for a scalar
loop accessing a[i] and b[i] we'd have one original IV + update
and the vectorizer generates two pointer IVs + updates.

But in the end the vector code shouldn't end up worse than the
scalar code with respect to IVs - the cases where it would should
be already costed.  So I wonder if you have specific examples
where things go worse enough for the heuristic to trigger?

Thanks,
Richard.

>
> >
> > The patch looks OK btw.
> >
>
> Thanks for the review!
>
> The patch was bootstrapped/reg-tested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu.  Committed in r12-704.
>
>
> BR,
> Kewen

Reply via email to