On Mon, Dec 9, 2024 at 11:15 PM Lewis Hyatt <lhy...@gmail.com> wrote: > > On Mon, Dec 09, 2024 at 02:07:07PM +0100, Richard Biener wrote: > > On Tue, Dec 3, 2024 at 2:42 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Tue, Dec 3, 2024 at 2:07 PM Lewis Hyatt <lhy...@gmail.com> wrote: > > > > > > > > On Tue, Dec 03, 2024 at 01:28:28PM +0100, Richard Biener wrote: > > > > > On Mon, Dec 2, 2024 at 1:55 AM Lewis Hyatt <lhy...@gmail.com> wrote: > > > > > > > > > > > > This patch is my new way to handle what was previously done in v2 > > > > > > patch > > > > > > 04/14, discussed at: > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669527.html > > > > > > > > > > > > The only places I have found running into trouble with reallocated > > > > > > PHI nodes > > > > > > (at least, the only places revealed by changing the size of > > > > > > location_t) are > > > > > > those which remember a gphi object before a call to loop_version() > > > > > > and then > > > > > > try to use it after. I fixed three affected call sites in this new > > > > > > patch. > > > > > > > > > > I guess it might be nice to avoid PHI node re-allocation by > > > > > loop_version(), > > > > > I remember fixing up other CFG manipulations to avoid the need for > > > > > intermittent more PHI arguments. Anway, the patch is OK besides one > > > > > minor adjustment, see below ... > > > > > > > > > > > > > Thanks again for looking at all the patches. Comment below... > > > > > > > > ... > > > > > > > > > > diff --git a/gcc/tree-vect-loop-manip.cc > > > > > > b/gcc/tree-vect-loop-manip.cc > > > > > > index 54b29540eb5..4f3abb94e6a 100644 > > > > > > --- a/gcc/tree-vect-loop-manip.cc > > > > > > +++ b/gcc/tree-vect-loop-manip.cc > > > > > > @@ -4190,6 +4190,14 @@ vect_loop_versioning (loop_vec_info > > > > > > loop_vinfo, > > > > > > prob * prob2, (prob * prob2).invert (), > > > > > > prob * prob2, (prob * prob2).invert (), > > > > > > true); > > > > > > + > > > > > > + /* If the PHI nodes in the loop header were reallocated, we > > > > > > need to fix up > > > > > > + our internally stashed copies of those. */ > > > > > > + if (loop_to_version == loop) > > > > > > + for (auto gsi = gsi_start_phis (loop->header); > > > > > > + !gsi_end_p (gsi); gsi_next (&gsi)) > > > > > > + loop_vinfo->resync_stmt_addr (gsi.phi ()); > > > > > > > > > > Instead of a new resync_stmt_addr method can you simply do > > > > > > > > > > if (stmt_vec_info s = loop_vinfo->lookup_stmt (gsi.phi ()) > > > > > STMT_VINFO_STMT (s) = gsi.phi (); > > > > > > > > > > please? > > > > > > > > > > OK with that change. > > > > > > > > > > Thanks, > > > > > Richard. > > > > > > > > The reason I added the new function is that lookup_stmt() does: > > > > > > > > if (res && res->stmt == stmt) > > > > return res; > > > > > > > > so it returns NULL if the stored stmt does not match the one that was > > > > looked > > > > up. > > > > > > Doh, I wonder what we added that for! > > > > > > > I could add a "bool relaxed = false" argument to lookup_stmt() instead, > > > > or could add an alternate function lookup_stmt_relaxed() if you prefer? > > > > I > > > > guess the second option would be pretty similar to what I did with > > > > resync_stmt_addr(), just it would not actually make the change within > > > > the > > > > function itself. > > > > > > No, in this case your patch is OK with an added comment (in your new > > > function) > > > why you can't use lookup_stmt. > > > > Btw, I tried to look into why loop_version would ever need more PHI capacity > > temporarily - so I did the following but it never triggers during bootstrap > > or > > regtest. That said - do you have a testcase that exercises this path? > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc > > index 92ecd881508..bc3c6254fdf 100644 > > --- a/gcc/tree-vectorizer.cc > > +++ b/gcc/tree-vectorizer.cc > > @@ -556,6 +556,7 @@ vec_info::resync_stmt_addr (gimple *stmt) > > stmt_vec_info res = stmt_vec_infos[uid - 1]; > > if (res && res->stmt) > > { > > + gcc_assert (res->stmt == stmt); > > res->stmt = stmt; > > return res; > > } > > > > > > Sorry, I should have mentioned the relevant testcases in the commit > message. Reverting this whole patch would lead to the following C testcases > breaking on sparc 32-bit (I tested on cfarm216): > > gcc.dg/graphite/pr81945.c > gcc.dg/torture/pr113707-2.c > gcc.dg/vect/bb-slp-41.c > gcc.dg/vect/pr53185.c > gcc.dg/vect/pr58508.c > gcc.dg/vect/pr81740-1.c > gcc.dg/vect/vect-100.c > gcc.dg/vect/vect-106.c > gcc.dg/vect/vect-alias-check-1.c > gcc.dg/vect/vect-alias-check-6.c > gcc.dg/vect/vect-align-2.c > > If, instead of reverting the whole patch, you just add the gcc_assert that > you mentioned in tree-vectorizer.cc, then you will see it go off on those > vect/ cases but also on many more. > > It seems that on 64-bit platforms there is usually enough room for 4 args in > a newly created PHI node, which is why this reallocation doesn't usually > happen, at least for typical examples. On sparc, the way it works out with > the size and padding etc, you can end up with room for just the minimum 2 > PHI arguments, which then necessitates the reallocation inside > loop_version(). resize_phi_nodes() is being called from > reserve_phi_args_for_new_edge(), from gimple_execute_on_growing_pred(), from > duplicate_loop_body_to_header_edge(). > > One way to observe this on 64-bit platforms is to be less aggressive about > preallocating extra room for the PHI args. Right now it rounds up to the > next power of 2, but it could round up instead to the next ggc bucket > size. Here is a quick patch that does that: > > ===== > > diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc > index 5a7e4a94e57..e80611748aa 100644 > --- a/gcc/tree-phinodes.cc > +++ b/gcc/tree-phinodes.cc > @@ -143,7 +143,7 @@ static int > ideal_phi_node_len (int len) > { > size_t size, new_size; > - int log2, new_len; > + int new_len; > > /* We do not support allocations of less than two PHI argument slots. */ > if (len < 2) > @@ -153,9 +153,8 @@ ideal_phi_node_len (int len) > size = sizeof (struct gphi) > + (len - 1) * sizeof (struct phi_arg_d); > > - /* Round it up to the next power of two. */ > - log2 = ceil_log2 (size); > - new_size = 1 << log2; > + /* Round it up to what ggc is actually going to allocate. */ > + new_size = ggc_round_alloc_size (size); > > /* Now compute and return the number of PHI argument slots given an > ideal size allocation. */ > > ===== > > I have not looked into that change much, but I could if you find it > interesting (it definitely results in smaller allocations). I did verify it > bootstraps + regtests cleanly for C++ on x86-64. With that change in place, > then your gcc_assert in tree-vectorizer.cc would be triggering in many cases: > > FAIL: gcc.dg/vect/bb-slp-41.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/bb-slp-42.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/bb-slp-phis-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/no-trapping-math-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr109011-1.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr109011-2.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr109011-3.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr109011-4.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr109011-5.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr116352.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr117874.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr20122.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr23816-1.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr23816-2.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr24059.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr28952.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr37482.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr37539.c -flto -ffat-lto-objects (internal compiler error: > in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr41956.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr49771.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr53185.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr56826.c -flto -ffat-lto-objects (internal compiler error: > in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr56933.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr58508.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr62075.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr79887.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr80815-1.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr80815-2.c (internal compiler error: in resync_stmt_addr, > at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr82108.c -flto -ffat-lto-objects (internal compiler error: > in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr84357.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr85586.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr85597.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr89268.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/pr92723.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/slp-45.c (internal compiler error: in resync_stmt_addr, at > tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/tsvc/vect-tsvc-s1421.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/tsvc/vect-tsvc-s151.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/tsvc/vect-tsvc-s422.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/tsvc/vect-tsvc-s423.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/tsvc/vect-tsvc-s424.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-77-global.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-78-global.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-10.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-11.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-12.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-14.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-15.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-18.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-19.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-6.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-7.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-8.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check-9.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-alias-check.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-cond-11.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-cost-model-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-early-break_124-pr114403.c (internal compiler error: > in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-early-break_23.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-early-break_24.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-early-break_80.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-live-3.c -flto -ffat-lto-objects (internal compiler > error: in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-live-4.c -flto -ffat-lto-objects (internal compiler > error: in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-live-5.c -flto -ffat-lto-objects (internal compiler > error: in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-live-slp-2.c -flto -ffat-lto-objects (internal > compiler error: in resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-1-big-array.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-2-big-array.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-2.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-3-big-array.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-3.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-4-big-array.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-over-widen-4.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-simd-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-simd-2.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-strided-epilogue-1.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-vfa-01.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-vfa-02.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-vfa-03.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-vfa-04.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-vfa-slp.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-widen-shift-s8.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > FAIL: gcc.dg/vect/vect-widen-shift-u8.c (internal compiler error: in > resync_stmt_addr, at tree-vectorizer.cc:559) > > since then the need for PHI reallocations is much more common. > Hope that's helpful...
So I now looked into this and the re-allocations in loop_version are because of "wrong" order of edge redirections. The following seems to fix that: diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 534e556e1e4..17bcf9f4acc 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -1447,9 +1447,9 @@ duplicate_loop_body_to_header_edge (class loop *loop, edge e, } else { + redirect_edge_and_branch_force (e, new_bbs[0]); redirect_edge_and_branch_force (new_spec_edges[SE_LATCH], loop->header); - redirect_edge_and_branch_force (e, new_bbs[0]); set_immediate_dominator (CDI_DOMINATORS, new_bbs[0], e->src); e = new_spec_edges[SE_LATCH]; } I'm going to test and push this. As said, we should try to avoid re-allocations (but also edge re-orderings) where possible. Richard. > -Lewis