RE: [PATCH][vect]: Fix LCSSA wiring during peeling of multiple-exit loops

Tamar Christina Tue, 28 Oct 2025 01:05:50 -0700

> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: 28 October 2025 08:00
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>; [email protected]
> Subject: RE: [PATCH][vect]: Fix LCSSA wiring during peeling of multiple-exit
> loops
> 
> On Mon, 27 Oct 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: 27 October 2025 14:27
> > > To: Tamar Christina <[email protected]>
> > > Cc: [email protected]; nd <[email protected]>; [email protected]
> > > Subject: RE: [PATCH][vect]: Fix LCSSA wiring during peeling of 
> > > multiple-exit
> > > loops
> > >
> > > On Mon, 27 Oct 2025, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <[email protected]>
> > > > > Sent: 27 October 2025 13:37
> > > > > To: Tamar Christina <[email protected]>
> > > > > Cc: [email protected]; nd <[email protected]>;
> [email protected]
> > > > > Subject: Re: [PATCH][vect]: Fix LCSSA wiring during peeling of 
> > > > > multiple-
> exit
> > > > > loops
> > > > >
> > > > > On Sat, 25 Oct 2025, Tamar Christina wrote:
> > > > >
> > > > > > Arg..
> > > > > >
> > > > > > Just noticed a typo..
> > > > > >
> > > > > > BB 3:
> > > > > >   i_2 = PHI <i_1(2)>
> > > > > >
> > > > > > Should be
> > > > > >
> > > > > >
> > > > > > BB 3:
> > > > > >   i_2 = PHI <i_0(2)>
> > > > > >
> > > > > > Rest is OK. Sorry about that
> > > > > >
> > > > > > ________________________________
> > > > > > From: Tamar Christina
> > > > > > Sent: Saturday, October 25, 2025 10:52 AM
> > > > > > To: [email protected] <[email protected]>
> > > > > > Cc: nd <[email protected]>; [email protected] <[email protected]>;
> > > > > [email protected] <[email protected]>
> > > > > > Subject: [PATCH][vect]: Fix LCSSA wiring during peeling of multiple-
> exit
> > > loops
> > > > > >
> > > > > > Consider this Fortran testcase:
> > > > > >
> > > > > > integer (8) b, c
> > > > > >   integer  d
> > > > > >   c = 10
> > > > > >   d = 2
> > > > > >   call e ((/ (b, b = j, c, d) /), 0_8, c, d + 0_8)
> > > > > > contains
> > > > > >   subroutine e (a, f, g, h)
> > > > > >     integer (8), dimension (:) :: a
> > > > > >     integer (8)  f, g, h
> > > > > >     i = 1
> > > > > >     do b = f, g, h
> > > > > >       if (a (i) .ne. b) STOP
> > > > > >       i = i + 1
> > > > > >     end do
> > > > > >     if (size (a) .ne. i - 1) STOP 2
> > > > > >   end
> > > > > > end
> > > > > >
> > > > > > which is a reduced example from the testsuite so not added as a new
> > > test.
> > > > > >
> > > > > > This testcase starts the loop at i = 1, but no peeling needs to be 
> > > > > > done.
> > > > > > As a result the loop form looks like
> > > > > >
> > > > > > BB 1:
> > > > > >   i_0 = PHI <1(initial), i_1(latch)>;
> > > > > >
> > > > > >   if (<early-break-condition>)
> > > > > >     ...
> > > > > >   else
> > > > > >     goto BB 2;
> > > > > >
> > > > > > BB 2:
> > > > > >   i_1 = i_0 + 1;
> > > > > >   if (<main-exit>)
> > > > > >     goto BB 3;
> > > > > >   goto BB 1;
> > > > > >
> > > > > > BB 3:
> > > > > >   i_2 = PHI <i_1(2)>;
> > > > > >
> > > > > > Basically in the main IV exit we don't use the value *after*
> incrementing
> > > but
> > > > > > rather the value before incrementing.
> > > > > >
> > > > > > At the moment during peeling we wipe away this information and
> replace
> > > in
> > > > > the
> > > > > > exit the value of i_1.  This is incorrect and means we can no 
> > > > > > longer tell
> > > > > > which value of the IV we should actually be calculating for that 
> > > > > > exit.
> > > > > >
> > > > > > This isn't triggered often on trunk because of an artificial 
> > > > > > inflation of
> the
> > > VF
> > > > > > which causes the main exist not be reached very often and such loops
> > > aren't
> > > > > > common.  However with the scalar IV code this issue was brought to
> > > light.
> > > > > >
> > > > > > This patch preserves the value of the LC PHI set during edge
> redirection.
> > > We
> > > > > > end up still creating the new PHI but it uses the correct value now.
> The
> > > > > reason
> > > > > > for still creating the duplicate PHI is because there are code 
> > > > > > outside of
> > > > > > peeling that today depends on the PHI node ordering.  So this was a
> > > > > conservative
> > > > > > fix.  The duplicate PHI gets elided later as it'll be unused.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > > > -m32, -m64 and no issues
> > > > > >
> > > > > > Ok for master?
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * tree-vect-loop-manip.cc
> (slpeel_tree_duplicate_loop_to_edge_cfg):
> > > Fix
> > > > > >         up PHI in exit nodes.
> > > > > >
> > > > > > ---
> > > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-
> manip.cc
> > > > > > index
> > > > >
> > >
> 20141dbc2e54d49bca00a2e75a6733d181e05ed1..96ca273c24680556f16c
> > > > > dc9e465f490d7fcdb8a4 100644
> > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > @@ -1755,6 +1755,22 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop, edge loop_exit,
> > > > > >                      new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > > > >                                                       
> > > > > > loop_latch_edge (loop));
> > > > > >
> > > > > > +                 /* Some loops start at a non-zero index, which 
> > > > > > then require
> > > > > > +                    during redirection to return i instead of i + 
> > > > > > 1.  Edge
> > > > > > +                    redirection already deals with this, but we 
> > > > > > need to make
> > > > > > +                    sure we don't override the value it puts in 
> > > > > > for these
> > > > > > +                    cases.  */
> > > > > > +                 imm_use_iterator imm_iter;
> > > > > > +                 use_operand_p use_p;
> > > > > > +                 tree from_def = PHI_RESULT (from_phi);
> > > > > > +                 bool abnormal_exit_condition = false;
> > > > > > +                 FOR_EACH_IMM_USE_FAST (use_p, imm_iter, from_def)
> > > > > > +                   if (gimple_bb (USE_STMT (use_p)) ==
> main_loop_exit_block)
> > > > > > +                     {
> > > > > > +                       abnormal_exit_condition = true;
> > > > > > +                       break;
> > > > > > +                     }
> > > > > > +
> > > > >
> > > > > I'm a bit confused - isn't this exactly what 'peeled_iters'
> > > > > (aka LOOP_VINFO_EARLY_BREAKS_VECT_PEELED) is about?  That is,
> > > > > this loop is supposed to handle the main exit (and creating
> > > > > all required merge PHIs and eventually missing LC PHIs), so
> > > > > your patch suggests we put in the "wrong" result, but you
> > > > > leave the old 'peeled_iters' code around, even using the
> > > > > original 'caching' (but then for the wrong value?).
> > > > >
> > > >
> > > > No, peeled_iters means we want the value at the start of the vector
> > > iteration.
> > > > This issue is that we want the value at the end of the vector iteration 
> > > > -
> > > <offset>.
> > > >
> > > > So they aren't the same.  In a normal loop, this adjustment is 
> > > > calculated
> by
> > > > vect_update_ivs_after_vectorizer through niters because it has the right
> PHI
> > > node
> > > > it doesn't use the incorrect final value of i to adjust the count.
> > > >
> > > > > At the same time the 2nd loop filling the alternate exit cases
> > > > > is always correct?
> > > >
> > > > Yes, because alternate exits always restart the loop from the start, so 
> > > > the
> > > reduction
> > > > for it takes you before you do any adjustment at all for i.
> > >
> > > So I belive the logic
> > >
> > >                   /* When the vector loop is peeled then we need to use
> > > the
> > >                      value at start of the loop, otherwise the main loop
> > > exit
> > >                      should use the final iter value.  */
> > >                   tree new_arg;
> > >                   if (peeled_iters)
> > >                     new_arg = gimple_phi_result (from_phi);
> > >                   else
> > >                     new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > >                                                      loop_latch_edge
> > > (loop));
> > >
> > > for !peeled_iters is correct.  In this place we are looking for the
> > > initial value to use for the epilog loop - but in this case this
> > > value is different from the actual live value computed by the testcase.
> > > Note there is in general no way to associate the LC PHI [def] with
> > > the IV in the loop header if it is neither the PHI result nor the
> > > latch value - as it is in this case.  Instead the LC PHI is for
> > > an alltogether different "IV" (but a related, offsetted one).
> > >
> > > And I see '5' being used as the continuation value via
> > > vect_update_ivs_after_vectorizer, so that seems correct.
> > >
> > > So, I can't really see what is wrong here?  The vectorized code
> > > also looks OK.  It should be possible to set up a C testcase for
> > > this, like
> > >
> >
> > As discussed on IRC, I'll move the fix somewhere else then. But might
> > end up being more disruptive.
> 
> I'd like to have a look myself, but for this I have to be able to
> reproduce an actual miscompile.  So can you please give instructions
> on how to arrive at one?  What did you reduce the testcase from?
> What patches do I need to apply to reproduce, on which target, with
> what options?
>


https://gcc.gnu.org/pipermail/gcc-patches/2025-October/698584.html
https://gcc.gnu.org/pipermail/gcc-patches/2025-October/698586.html

This testcase https://gcc.gnu.org/pipermail/gcc-patches/2025-October/698585.html

With -march=armv8-a -O3

Tamar

> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > >  int i = 1, j;
> > >  do
> > >    {
> > >      if (...) abort ();
> > >      j = i + 1;
> > >    }
> > >  while (...);
> > >  ... use i ...
> > >
> > > Richard.
> > >
> > > > >
> > > > > Does the issue show with the fortran testcase on current trunk?
> > > > >
> > > >
> > > > Yes, the codegen on trunk today is wrong, but the main exit would never
> be
> > > reached
> > > > because of the increased VF.  So the tests "passes" because it takes the
> early
> > > exit.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > >                    /* Check if we've already created a new phi node 
> > > > > > during
> edge
> > > > > >                       redirection and re-use it if so.  Otherwise 
> > > > > > create a
> > > > > >                       LC PHI node to feed the merge PHI.  */
> > > > > > @@ -1769,6 +1785,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop, edge loop_exit,
> > > > > >                      new_arg = *res;
> > > > > >                    else
> > > > > >                      {
> > > > > > +                     /* If we already found an LC PHI for the 
> > > > > > current PHI then
> > > > > > +                        just re-use it.  */
> > > > > > +                     if (abnormal_exit_condition)
> > > > > > +                       new_arg = from_def;
> > > > > > +
> > > > > >                        /* Create the LC PHI node for the exit.  */
> > > > > >                        tree new_def = copy_ssa_name (new_arg);
> > > > > >                        gphi *lc_phi
> > > > > > @@ -1779,7 +1800,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop, edge loop_exit,
> > > > > >
> > > > > >                    /* Create the PHI node in the merge block 
> > > > > > merging the
> > > > > >                       main and early exit values.  */
> > > > > > -                 tree new_res = copy_ssa_name (gimple_phi_result
> (from_phi));
> > > > > > +                 tree new_res = copy_ssa_name (from_def);
> > > > > >                    gphi *lcssa_phi = create_phi_node (new_res,
> new_preheader);
> > > > > >                    edge main_e = single_succ_edge 
> > > > > > (main_loop_exit_block);
> > > > > >                    SET_PHI_ARG_DEF_ON_EDGE (lcssa_phi, main_e, 
> > > > > > new_arg);
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <[email protected]>
> > > > > SUSE Software Solutions Germany GmbH,
> > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <[email protected]>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

RE: [PATCH][vect]: Fix LCSSA wiring during peeling of multiple-exit loops

Reply via email to