16 Regression] missed optimization around a loop with a checker

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 30 Apr 2025 02:44:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120003


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Macleod from comment #4)
> > This seems to be the issue?
> > 
> >   <bb 4> [local count: 350791453]:
> >   _1 = g (i_3);
> >   if (_1 != 0)
> >     goto <bb 5>; [50.00%]
> >   else
> >     goto <bb 6>; [50.00%]
> > 
> >   <bb 5> [local count: 175395727]:
> > 
> >   <bb 6> [local count: 1063004408]:
> >   # iftmp.0_4 = PHI <1(3), 0(4), 1(5)>
> > 
> > That 3 way PHI isn't used in any threads,  so we don't get a threaded path
> > to the eventual return of 1.
> 
> The irreducible check is at least badly named - as written it does not
> make the containing loop irreducible, instead it partly unrolls things.
> 
> But with that fixed we still reject the path in
> jt_path_registry::cancel_invalid_paths by
> 
> 2840          cancel_thread (&path, "Path crosses loop header but does not
> exit it");
> 
> which is true again.  We can allow another subset of threads, but this
> then enables the
> 
> path: 9->6->7->3->6
> 
> path which just duplicates one iteration which does not help.
> 
> We need to create a subloop or sibling loop w/o the call.  I don't see
> offhand why this doesn't work - but then isolating a path will never
> create a new loop(?)
> 
> I've played with the following.
> 
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 23bfc14c8f0..2603d27f1f3 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -789,6 +789,7 @@ back_threader_profitability::profitable_path_p (const
> vec<basic_block> &m_path,
>    *creates_irreducible_loop = false;
>    if (m_threaded_through_latch
>        && loop == taken_edge->dest->loop_father
> +      && taken_edge->dest != m_path[m_path.length () - 2]
>        && (determine_bb_domination_status (loop, taken_edge->dest)
>           == DOMST_NONDOMINATING))
>      *creates_irreducible_loop = true;
> diff --git a/gcc/tree-ssa-threadupdate.cc b/gcc/tree-ssa-threadupdate.cc
> index 4e5c7566857..d91c0c7bf20 100644
> --- a/gcc/tree-ssa-threadupdate.cc
> +++ b/gcc/tree-ssa-threadupdate.cc
> @@ -2811,6 +2811,10 @@ jt_path_registry::cancel_invalid_paths
> (vec<jump_thread_edge *> &path)
>        && flow_loop_nested_p (exit->dest->loop_father,
> exit->src->loop_father))
>      return false;
>  
> +  // If we thread a whole loop round-trip, we are just creating a subloop
> +  if (entry->dest == exit->dest)
> +    return false;
> +
>    if (cfun->curr_properties & PROP_loop_opts_done)
>      return false;

Note this patch ends up restoring the optimization, just the threading
itself isn't it.

Instead thread2 forms the inner loop and threadfull2 then makes it a
sibling loop which cddce3 can elide.

So a quite complicated dance, threadfull, thread, threadfull.  The question
is why we need to iterate here and whether we can do better here.  After
loop opts we only have one threadfull instance.

In particular disabling thread2 makes threadfull2 form the inner loop
and we lose.  Disabling threadfull1 (with the above patch) makes neither
pass do any threading (not even the one I got threadfull1 to do),
possibly because the loop was rotated by header copying to the
following and there we don't seem to try the cross-iteration invariance
of (retval_15 != 0) == true, or rather it's possibly the lack of a
forwarder for the 3->5 edge and that we're basic-block based, we
only consider '3' once.

  <bb 3> [local count: 1063004408]:
  # retval_15 = PHI <prephitmp_16(7), 0(2)>
  # i_17 = PHI <i_11(7), 0(2)>
  if (retval_15 != 0)
    goto <bb 5>; [67.00%]
  else
    goto <bb 4>; [33.00%]

  <bb 4> [local count: 350791453]:
  _1 = g (i_17);

  <bb 5> [local count: 1063004408]:
  # prephitmp_16 = PHI <1(3), _1(4)>
  i_11 = i_17 + 1;
  if (i_11 != 1000000)
    goto <bb 7>; [98.99%]
  else
    goto <bb 6>; [1.01%]

  <bb 7> [local count: 1052266995]:
  goto <bb 3>; [100.00%]

  <bb 6> [local count: 10737416]:
  return prephitmp_16;

In fact fixing that fixes the regression with the help of threadfull2 + vrp2.

[Bug tree-optimization/120003] [12/13/14/15/16 Regression] missed optimization around a loop with a checker

Reply via email to