On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> When performing early break vectorization we need to be sure that the vector
> operations are safe to perform.  A simple example is e.g.
> 
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_a[i] = x;
>  }
> 
> where the store to vect_b is not allowed to be executed unconditionally since
> if we exit through the early break it wouldn't have been done for the full VF
> iteration.
> 
> Effective the code motion determines:
>   - is it safe/possible to vectorize the function
>   - what updates to the VUSES should be performed if we do
>   - Which statements need to be moved
>   - Which statements can't be moved:
>     * values that are live must be reachable through all exits
>     * values that aren't single use and shared by the use/def chain of the 
> cond
>   - The final insertion point of the instructions.  In the cases we have
>     multiple early exist statements this should be the one closest to the loop
>     latch itself.
> 
> After motion the loop above is:
> 
>  for (int i = 0; i < N; i++)
>  {
>    ... y = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_b[i] = y;
>    vect_a[i] = x;
> 
>  }
> 
> The operation is split into two, during data ref analysis we determine
> validity of the operation and generate a worklist of actions to perform if we
> vectorize.
> 
> After peeling and just before statetement tranformation we replay this 
> worklist
> which moves the statements and updates book keeping only in the main loop 
> that's
> to be vectorized.  This includes updating of USES in exit blocks.
> 
> At the moment we don't support this for epilog nomasks since the additional
> vectorized epilog's stmt UIDs are not found.

As of UIDs note that UIDs are used for dominance checking in
vect_stmt_dominates_stmt_p and that at least is used during
transform when scheduling SLP.  Moving stmts around invalidates
this UID order (I don't see you "renumbering" UIDs).

More comments below.
 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * tree-vect-data-refs.cc (validate_early_exit_stmts): New.
>       (vect_analyze_early_break_dependences): New.
>       (vect_analyze_data_ref_dependences): Use them.
>       * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
>       early_breaks.
>       (move_early_exit_stmts): New.
>       (vect_transform_loop): use it/
>       * tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
>       * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
>       (class _loop_vec_info): Add early_breaks, early_break_conflict,
>       early_break_vuses.
>       (LOOP_VINFO_EARLY_BREAKS): New.
>       (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
>       (LOOP_VINFO_EARLY_BRK_DEST_BB): New.
>       (LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 
> d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60
>  100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct 
> data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside 
> from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +     a[i] = 8
> +     c = a[i]
> +     if (b[i])
> +       ...
> +
> +     is not allowed, but
> +
> +     c = a[i]
> +     a[i] = 8
> +     if (b[i])
> +       ...
> +
> +     is which is the common case.
> +
> +   Arguments:
> +     - LOOP_VINFO: loop information for the current loop.
> +     - CHAIN: Currently detected sequence of instructions that need to be 
> moved
> +           if we are to vectorize this early break.
> +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are 
> reachable from
> +           one or more cond conditions.  If this set overlaps with CHAIN 
> then FIXED
> +           takes precedence.  This deals with non-single use cases.
> +     - LOADS: List of all loads found during traversal.
> +     - BASES: List of all load data references found during traversal.
> +     - GSTMT: Current position to inspect for validity.  The sequence
> +           will be moved upwards from this point.
> +     - REACHING_VUSE: The dominating VUSE found so far.  */
> +
> +static bool
> +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
> +                        hash_set<tree> *fixed, vec<tree> *loads,
> +                        vec<data_reference *> *bases, tree *reaching_vuse,
> +                        gimple_stmt_iterator *gstmt)
> +{
> +  if (gsi_end_p (*gstmt))
> +    return true;
> +
> +  gimple *stmt = gsi_stmt (*gstmt);
> +  /* ?? Do I need to move debug statements? not quite sure..  */

I think we reset them.

> +  if (gimple_has_ops (stmt)
> +      && !is_gimple_debug (stmt))
> +    {
> +      tree dest = NULL_TREE;
> +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> +      use the LHS, if not, assume that the first argument of a call is the
> +      value being defined.  e.g. MASKED_LOAD etc.  */
> +      if (gimple_has_lhs (stmt))
> +     dest = gimple_get_lhs (stmt);
> +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> +     dest = gimple_arg (call, 0);
> +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> +     {
> +       /* Operands of conds are ones we can't move.  */
> +       fixed->add (gimple_cond_lhs (cond));
> +       fixed->add (gimple_cond_rhs (cond));
> +     }
> +
> +      bool move = false;


So this all looks a bit like spaghetti (sorry).  I think what
you want to do is perform this in two steps:

 a) mark (and check) the dependences of the early break conditions,
    aka populate 'fixed'
 b) walk stmts from the _last_ early break, verifying all earlier
    non-'fixed' stmts can be moved

for marking dependences you want to simply iterate over use
operands:

  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
    USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
    the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
    a worklist with a visited set to gather all of the relevant
    stmts/defs

> +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_vinfo)
> +     {
> +        if (dump_enabled_p ())
> +          dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                           "early breaks not supported. Unknown"
> +                           " statement: %G", stmt);
> +        return false;
> +     }
> +
> +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +      if (dr_ref)
> +     {
> +        /* We currently only support statically allocated objects due to
> +           not having first-faulting loads support or peeling for alignment
> +           support.  Compute the size of the referenced object (it could be
> +           dynamically allocated).  */
> +        tree obj = DR_BASE_ADDRESS (dr_ref);
> +        if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +          {
> +            if (dump_enabled_p ())
> +              dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                               "early breaks only supported on statically"
> +                               " allocated objects.\n");
> +            return false;
> +          }
> +
> +        tree refop = TREE_OPERAND (obj, 0);
> +        tree refbase = get_base_address (refop);
> +        if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +            || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +          {
> +            if (dump_enabled_p ())
> +              dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                               "early breaks only supported on statically"
> +                               " allocated objects.\n");
> +            return false;
> +          }

Note this doesn't ensure in-bound accesses:

int a[4];

void foo ()
{
  for (unsigned int i = 0; i < 32; ++i)
   {
     if (a[i] == 0)
       break;
     /* ... */
   }
}

you'd happily load a V8SImode vector from 'a'.  If the caller
ensures a[3] == 0 the code is fine but your transformed vector
code not.  You need to check that DR_BASE_ADDRESS + DR_OFFSET
+ DR_INIT + niter * DR_STEP is within the object instead.

> +
> +        if (DR_IS_READ (dr_ref))
> +          {
> +             loads->safe_push (dest);
> +             bases->safe_push (dr_ref);
> +          }
> +        else if (DR_IS_WRITE (dr_ref))
> +          {
> +             for (auto dr : bases)
> +               if (same_data_refs_base_objects (dr, dr_ref))
> +                 {

that looks quadratic to me.  So what's this actually?  You've
gathered all loads after this write and now you are checking
that all those loads do not alias the write?  But
same_data_refs_base_objects is only verifying that the
two refs are suitable for classical dependence analysis,
so it's not a conservative test here.  I think you may want to
use dr_may_alias_p instead?

I'm missing some overall idea of what you are doing, like what's
the actual transform and how do you validate its validity?

It looks like you only move stores?

> +                   if (dump_enabled_p ())
> +                       dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                                        vect_location,
> +                                        "early breaks only supported,"
> +                                        " overlapping loads and stores found"
> +                                        " before the break statement.\n");
> +                   return false;
> +                 }
> +             /* Any writes starts a new chain. */
> +             move = true;
> +          }
> +     }
> +
> +      /* If a statement is live and escapes the loop through usage in the 
> loop
> +      epilogue then we can't move it since we need to maintain its
> +      reachability through all exits.  */
> +      bool skip = false;
> +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> +       && !(dr_ref && DR_IS_WRITE (dr_ref)))
> +     {
> +       imm_use_iterator imm_iter;
> +       use_operand_p use_p;
> +       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> +         {
> +           basic_block bb = gimple_bb (USE_STMT (use_p));
> +           skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +           if (skip)
> +             break;
> +         }
> +     }
> +
> +      /* If we found the defining statement of a something that's part of the
> +      chain then expand the chain with the new SSA_VARs being used.  */
> +      if (!skip && (chain->contains (dest) || move))
> +     {
> +       move = true;
> +       for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> +         {
> +           tree var = gimple_arg (stmt, x);
> +           if (TREE_CODE (var) == SSA_NAME)
> +             {
> +               if (fixed->contains (dest))
> +                 {
> +                   move = false;
> +                   fixed->add (var);
> +                 }
> +               else
> +                 chain->add (var);
> +             }
> +           else
> +             {
> +               use_operand_p use_p;
> +               ssa_op_iter iter;
> +               FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +                 {
> +                   tree op = USE_FROM_PTR (use_p);
> +                   gcc_assert (TREE_CODE (op) == SSA_NAME);
> +                   if (fixed->contains (dest))
> +                     {
> +                       move = false;
> +                       fixed->add (op);
> +                     }
> +                   else
> +                     chain->add (op);
> +                 }
> +             }
> +         }
> +
> +       if (dump_enabled_p ())
> +         {
> +           if (move)
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                             "found chain %G", stmt);
> +           else
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                             "ignored chain %G, not single use", stmt);
> +         }
> +     }
> +
> +      if (move)
> +     {
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "==> recording stmt %G", stmt);
> +
> +       for (tree ref : loads)
> +         if (stmt_may_clobber_ref_p (stmt, ref, true))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "early breaks not supported as memory used"
> +                                " may alias.\n");
> +             return false;
> +           }

Here you check aliasing again?!

I think it might be conceptually easier (and stronger) to instead
think of the 'fixed' set (and the gconds) to be moved earlier instead
of the stores to be sunk.

For example I fail to see how you check for, say

   for (..)
    {
      tem = a[i] / b[i];
      if (c[i]) break;
      d[i] = tem;
    }

where the division might trap.  For this the validation wouldn't
identify anything to move, right?

I'll note that doing the actual movement will be easier with SLP
and it would be a possibility to implement early break with just
SLP support - as we need to start discovery from the gconds
explicitly anyway there's no problem forcing a single-lane SLP
discovery there.

> +
> +       /* If we've moved a VDEF, extract the defining MEM and update
> +          usages of it.   */
> +       tree vdef;
> +       if ((vdef = gimple_vdef (stmt)))
> +         {
> +           /* This statement is to be moved.  */
> +           LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
> +           *reaching_vuse = gimple_vuse (stmt);
> +         }
> +     }
> +    }
> +
> +  gsi_prev (gstmt);
> +
> +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> +                               reaching_vuse, gstmt))
> +    return false;

Please use a loop instead of recursion.  I suggest to do the loop at the 
single caller.

> +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> +    {
> +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "marked statement for vUSE update: %G", stmt);
> +    }
> +
> +  return true;
> +}
> +
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they 
> become
> +   safe for vectorization.  The function also calculates the place where to 
> move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  hash_set<tree> chain, fixed;
> +  auto_vec<tree> loads;
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +  tree vuse = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +                  "loop contains multiple exits, analyzing"
> +                  " statement dependencies.\n");
> +
> +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +     continue;
> +
> +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);

isn't that 'c' already?

> +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +
> +      /* Initiaze the vuse chain with the one at the early break.  */
> +      if (!vuse)
> +     vuse = gimple_vuse (c);

gconds do not have virtual operands.

> +
> +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> +                                  &bases, &vuse, &gsi))
> +     return opt_result::failure_at (stmt,
> +                                    "can't safely apply code motion to "
> +                                    "dependencies of %G to vectorize "
> +                                    "the early exit.\n", stmt);
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +     is where statements should be moved to.  */
> +      if (!dest_bb)
> +     dest_bb = gimple_bb (c);
> +      else
> +     {
> +       basic_block curr_bb = gimple_bb (c);
> +       if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +         dest_bb = curr_bb;
> +     }
> +    }
> +
> +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;

no edge is the fallthru edge out of a condition, so this always
selects EDGE_SUCC (dest_bb, 1) which cannot be correct (well,
guess you're lucky).  I think you instead want

  dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == 
dest_bb->loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 
1)->dest;

more nicely written, of course.

> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;

Sorting the vector of early breaks as we gather them might be nicer
than this - you'd then simply use the first or last.

> +
> +  /* TODO: Remove? It's useful debug statement but may be too much.  */
> +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_NOTE, vect_location,
> +                      "updated use: %T, mem_ref: %G",
> +                      vuse, g);
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +                  "recorded statements to be moved to BB %d\n",
> +                  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info 
> loop_vinfo,
>         return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, 
> tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the 
> location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);

I suppose dest_bb is the in-loop block following the last early
exit?  I suppose we do not support an "early" exit after the
main IV exit, right?  Instead we'd require loop rotation?

> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +      elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +     continue;
> +
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +      update_stmt (stmt);

You shouldn't need to update_stmt here I think.

> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS 
> (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "updating vuse to %T for stmt %G", vuse, p);
> +      unlink_stmt_vdef (p);

it's odd to first move the stmts and then propagate out their defs
(which you forget to release?)

> +      gimple_set_vuse (p, vuse);

and now every store gets the same vuse?  I'm quite sure you'll end
up with broken virtual SSA form here.

> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, 
> gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization 
> after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, 
> enum vect_def_type *dt,
>       case vect_first_order_recurrence:
>         dump_printf (MSG_NOTE, "first order recurrence\n");
>         break;
> +       case vect_early_exit_def:
> +       dump_printf (MSG_NOTE, "early exit\n");
> +       break;
>       case vect_unknown_def_type:
>         dump_printf (MSG_NOTE, "unknown\n");
>         break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_early_exit_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of statements needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that 
> they need
> +     to be moved.  */
> +  auto_vec<gimple *> early_break_conflict;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> 
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to