mp2decoddata2 with -O3

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 24 Aug 2021 02:36:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089


--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So if we agree to a sane way to cost branchy code on the scalar side then it
should be possible to compare the scalar cost of the not if-converted inner
loop body against the full partially vectorized and if-converted inner loop
body.

vect_bb_vectorization_profitable_p would have to add the cost of the scalar
stmts not covered by vectorization - this set is conveniently available as
the set of stmts not having the visited flag set before we clear it here:

vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
                                    vec<slp_instance> slp_instances)
{
...
  /* Unset visited flag.  */
  stmt_info_for_cost *cost;
  FOR_EACH_VEC_ELT (scalar_costs, i, cost)
    gimple_set_visited  (cost->stmt_info->stmt, false);

so we'd need to walk over all stmts in the BB and add the cost of the
not marked stmts to the vector cost.  We'd want to force a single
SLP "subgraph" in this mode to avoid going over the whole block
multiple times.

[Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3

Reply via email to