On Mon, Dec 9, 2024 at 11:15 PM Lewis Hyatt <lhy...@gmail.com> wrote:
>
> On Mon, Dec 09, 2024 at 02:07:07PM +0100, Richard Biener wrote:
> > On Tue, Dec 3, 2024 at 2:42 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Tue, Dec 3, 2024 at 2:07 PM Lewis Hyatt <lhy...@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 03, 2024 at 01:28:28PM +0100, Richard Biener wrote:
> > > > > On Mon, Dec 2, 2024 at 1:55 AM Lewis Hyatt <lhy...@gmail.com> wrote:
> > > > > >
> > > > > > This patch is my new way to handle what was previously done in v2 
> > > > > > patch
> > > > > > 04/14, discussed at:
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669527.html
> > > > > >
> > > > > > The only places I have found running into trouble with reallocated 
> > > > > > PHI nodes
> > > > > > (at least, the only places revealed by changing the size of 
> > > > > > location_t) are
> > > > > > those which remember a gphi object before a call to loop_version() 
> > > > > > and then
> > > > > > try to use it after. I fixed three affected call sites in this new 
> > > > > > patch.
> > > > >
> > > > > I guess it might be nice to avoid PHI node re-allocation by 
> > > > > loop_version(),
> > > > > I remember fixing up other CFG manipulations to avoid the need for
> > > > > intermittent more PHI arguments.  Anway, the patch is OK besides one
> > > > > minor adjustment, see below ...
> > > > >
> > > >
> > > > Thanks again for looking at all the patches. Comment below...
> > > >
> > > > ...
> > > >
> > > > > > diff --git a/gcc/tree-vect-loop-manip.cc 
> > > > > > b/gcc/tree-vect-loop-manip.cc
> > > > > > index 54b29540eb5..4f3abb94e6a 100644
> > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > @@ -4190,6 +4190,14 @@ vect_loop_versioning (loop_vec_info 
> > > > > > loop_vinfo,
> > > > > >                             prob * prob2, (prob * prob2).invert (),
> > > > > >                             prob * prob2, (prob * prob2).invert (),
> > > > > >                             true);
> > > > > > +
> > > > > > +      /* If the PHI nodes in the loop header were reallocated, we 
> > > > > > need to fix up
> > > > > > +        our internally stashed copies of those.  */
> > > > > > +      if (loop_to_version == loop)
> > > > > > +       for (auto gsi = gsi_start_phis (loop->header);
> > > > > > +            !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > +         loop_vinfo->resync_stmt_addr (gsi.phi ());
> > > > >
> > > > > Instead of a new resync_stmt_addr method can you simply do
> > > > >
> > > > >             if (stmt_vec_info s = loop_vinfo->lookup_stmt (gsi.phi ())
> > > > >               STMT_VINFO_STMT (s) = gsi.phi ();
> > > > >
> > > > > please?
> > > > >
> > > > > OK with that change.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > >
> > > > The reason I added the new function is that lookup_stmt() does:
> > > >
> > > >       if (res && res->stmt == stmt)
> > > >         return res;
> > > >
> > > > so it returns NULL if the stored stmt does not match the one that was 
> > > > looked
> > > > up.
> > >
> > > Doh, I wonder what we added that for!
> > >
> > > > I could add a "bool relaxed = false" argument to lookup_stmt() instead,
> > > > or could add an alternate function lookup_stmt_relaxed() if you prefer? 
> > > > I
> > > > guess the second option would be pretty similar to what I did with
> > > > resync_stmt_addr(), just it would not actually make the change within 
> > > > the
> > > > function itself.
> > >
> > > No, in this case your patch is OK with an added comment (in your new 
> > > function)
> > > why you can't use lookup_stmt.
> >
> > Btw, I tried to look into why loop_version would ever need more PHI capacity
> > temporarily - so I did the following but it never triggers during bootstrap 
> > or
> > regtest.  That said - do you have a testcase that exercises this path?
> >
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index 92ecd881508..bc3c6254fdf 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -556,6 +556,7 @@ vec_info::resync_stmt_addr (gimple *stmt)
> >        stmt_vec_info res = stmt_vec_infos[uid - 1];
> >        if (res && res->stmt)
> >         {
> > +         gcc_assert (res->stmt == stmt);
> >           res->stmt = stmt;
> >           return res;
> >         }
> >
> >
>
> Sorry, I should have mentioned the relevant testcases in the commit
> message. Reverting this whole patch would lead to the following C testcases
> breaking on sparc 32-bit (I tested on cfarm216):
>
> gcc.dg/graphite/pr81945.c
> gcc.dg/torture/pr113707-2.c
> gcc.dg/vect/bb-slp-41.c
> gcc.dg/vect/pr53185.c
> gcc.dg/vect/pr58508.c
> gcc.dg/vect/pr81740-1.c
> gcc.dg/vect/vect-100.c
> gcc.dg/vect/vect-106.c
> gcc.dg/vect/vect-alias-check-1.c
> gcc.dg/vect/vect-alias-check-6.c
> gcc.dg/vect/vect-align-2.c
>
> If, instead of reverting the whole patch, you just add the gcc_assert that
> you mentioned in tree-vectorizer.cc, then you will see it go off on those
> vect/ cases but also on many more.
>
> It seems that on 64-bit platforms there is usually enough room for 4 args in
> a newly created PHI node, which is why this reallocation doesn't usually
> happen, at least for typical examples. On sparc, the way it works out with
> the size and padding etc, you can end up with room for just the minimum 2
> PHI arguments, which then necessitates the reallocation inside
> loop_version(). resize_phi_nodes() is being called from
> reserve_phi_args_for_new_edge(), from gimple_execute_on_growing_pred(), from
> duplicate_loop_body_to_header_edge().
>
> One way to observe this on 64-bit platforms is to be less aggressive about
> preallocating extra room for the PHI args. Right now it rounds up to the
> next power of 2, but it could round up instead to the next ggc bucket
> size. Here is a quick patch that does that:
>
> =====
>
> diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
> index 5a7e4a94e57..e80611748aa 100644
> --- a/gcc/tree-phinodes.cc
> +++ b/gcc/tree-phinodes.cc
> @@ -143,7 +143,7 @@ static int
>  ideal_phi_node_len (int len)
>  {
>    size_t size, new_size;
> -  int log2, new_len;
> +  int new_len;
>
>    /* We do not support allocations of less than two PHI argument slots.  */
>    if (len < 2)
> @@ -153,9 +153,8 @@ ideal_phi_node_len (int len)
>    size = sizeof (struct gphi)
>          + (len - 1) * sizeof (struct phi_arg_d);
>
> -  /* Round it up to the next power of two.  */
> -  log2 = ceil_log2 (size);
> -  new_size = 1 << log2;
> +  /* Round it up to what ggc is actually going to allocate.  */
> +  new_size = ggc_round_alloc_size (size);
>
>    /* Now compute and return the number of PHI argument slots given an
>       ideal size allocation.  */
>
> =====
>
> I have not looked into that change much, but I could if you find it
> interesting (it definitely results in smaller allocations). I did verify it
> bootstraps + regtests cleanly for C++ on x86-64. With that change in place,
> then your gcc_assert in tree-vectorizer.cc would be triggering in many cases:
>
> FAIL: gcc.dg/vect/bb-slp-41.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/bb-slp-42.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/bb-slp-phis-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/no-trapping-math-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr109011-1.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr109011-2.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr109011-3.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr109011-4.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr109011-5.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr116352.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr117874.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr20122.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr23816-1.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr23816-2.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr24059.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr28952.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr37482.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr37539.c -flto -ffat-lto-objects (internal compiler error: 
> in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr41956.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr49771.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr53185.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr56826.c -flto -ffat-lto-objects (internal compiler error: 
> in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr56933.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr58508.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr62075.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr79887.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr80815-1.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr80815-2.c (internal compiler error: in resync_stmt_addr, 
> at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr82108.c -flto -ffat-lto-objects (internal compiler error: 
> in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr84357.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr85586.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr85597.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr89268.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/pr92723.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/slp-45.c (internal compiler error: in resync_stmt_addr, at 
> tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/tsvc/vect-tsvc-s1421.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/tsvc/vect-tsvc-s151.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/tsvc/vect-tsvc-s422.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/tsvc/vect-tsvc-s423.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/tsvc/vect-tsvc-s424.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-77-global.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-78-global.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-10.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-11.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-12.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-14.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-15.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-18.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-19.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-6.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-7.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-8.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check-9.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-alias-check.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-cond-11.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-cost-model-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-early-break_124-pr114403.c (internal compiler error: 
> in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-early-break_23.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-early-break_24.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-early-break_80.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-live-3.c -flto -ffat-lto-objects (internal compiler 
> error: in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-live-4.c -flto -ffat-lto-objects (internal compiler 
> error: in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-live-5.c -flto -ffat-lto-objects (internal compiler 
> error: in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-live-slp-2.c -flto -ffat-lto-objects (internal 
> compiler error: in resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-1-big-array.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-2-big-array.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-2.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-3-big-array.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-3.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-4-big-array.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-over-widen-4.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-simd-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-simd-2.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-strided-epilogue-1.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-vfa-01.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-vfa-02.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-vfa-03.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-vfa-04.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-vfa-slp.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-widen-shift-s8.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
> FAIL: gcc.dg/vect/vect-widen-shift-u8.c (internal compiler error: in 
> resync_stmt_addr, at tree-vectorizer.cc:559)
>
> since then the need for PHI reallocations is much more common.
> Hope that's helpful...

So I now looked into this and the re-allocations in loop_version are
because of "wrong" order of
edge redirections.  The following seems to fix that:

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 534e556e1e4..17bcf9f4acc 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -1447,9 +1447,9 @@ duplicate_loop_body_to_header_edge (class loop
*loop, edge e,
        }
       else
        {
+         redirect_edge_and_branch_force (e, new_bbs[0]);
          redirect_edge_and_branch_force (new_spec_edges[SE_LATCH],
                                          loop->header);
-         redirect_edge_and_branch_force (e, new_bbs[0]);
          set_immediate_dominator (CDI_DOMINATORS, new_bbs[0], e->src);
          e = new_spec_edges[SE_LATCH];
        }

I'm going to test and push this.  As said, we should try to avoid
re-allocations (but also
edge re-orderings) where possible.

Richard.

> -Lewis

Reply via email to