Consider this simple loop

long long arr[1024];
long long *f()
{
    int i;
    for (i = 0; i < 1024; i++)
      if (arr[i] == 42)
        break;
    return arr + i;
}

where today we generate this at -O3:

.L2:
        add     v29.4s, v29.4s, v25.4s
        add     v28.4s, v28.4s, v26.4s
        cmp     x2, x1
        beq     .L9
.L6:
        ldp     q30, q31, [x1], 32
        cmeq    v30.2d, v30.2d, v27.2d
        cmeq    v31.2d, v31.2d, v27.2d
        addhn   v31.2s, v31.2d, v30.2d
        fmov    x3, d31
        cbz     x3, .L2

but which is highly inefficient.  This loops has 3 IVs (PR119577), one normal
scalar one, two vector ones, one counting up and one counting down (PR115120)
and has a forced unrolling due to an increase in VF because of the mismatch in
modes between the IVs and the loop body (PR119860).

This patch fixed all three of these issues and we now generate:

.L2:
        add     w2, w2, 2
        cmp     w2, 1024
        beq     .L13
.L5:
        ldr     q31, [x1]
        add     x1, x1, 16
        cmeq    v31.2d, v31.2d, v30.2d
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x0, d31
        cbz     x0, .L2

or with sve

.L3:
        add     x1, x1, x3
        whilelo p7.d, w1, w2
        b.none  .L11
.L4:
        ld1d    z30.d, p7/z, [x0, x1, lsl 3]
        cmpeq   p7.d, p7/z, z30.d, z31.d
        ptest   p15, p7.b
        b.none  .L3

which shows that the new scalar IV is efficiently merged with the loop
control one based on IVopts.

To accomplish this the patch reworks how we handle "forced lived inductions"
with regard to vectorization.

Prior to this change when we vectorize a loop with early break any induction
variables would be forced live.  Forcing live means that even though the values
aren't used inside the loop we must preserve the values such that when we start
the scalar loop we can pass the correct initial values.

However this had several side-effects:

1. We must be able to vectorize the induction.
2. The induction variable participates in VF determination.  This would often
   times lead to a higher VF than would have normally been needed.  As such the
   vector loops become less profitable.
3. IVcannon on constant loop iterations inserts a downward counting IV in
   addition to the upwards one in order to support things like doloops.
   Normally this duplicate IV is removed by IV opts, but IV doesn't understand
   vector inductions.  As such we end up with 3 IVs.

This patch fixes all three of these by choosing instead to "vectorize" the
forced live IVs as scalar statements.  This means we will recreate the scalar IV
and scale the value inside the loop by VF rather than step.

We have to create a new scalar IV because both LEN and Masked based
vectorization can have loops where the VF changes between loop iterations, which
makes it impossible to determine in the early exits how many elements you've
actually processed.  LEN based is easy to see how and for Masked loops this can
happen with First Faults as in this case you can get a partial result which
needs ot be handled.

The new scalar IVs are represented as an SLP tree which has only DEFs and no
scalar statements:

note:   === vect_analyze_slp ===
note:   Final scalar def SLP tree for instance 0x52717e0:
note:   node 0x52a7f40 (max_nunits=1, refcnt=1)
note:   op template: i_10 = PHI <i_7(7), 0(2)>
note:           { }

This SLP tree is treated essentially as an external def for most of SLP handling
but completely ignored by vect_make_slp_decision as it has no statements.

As we now have a scalar SLP node type, all code handling forced live inductions
are removed, and all code scattered around vectorizable_induction and
vectorizable_live_operation are moved to a central location in a new function
vectorizable_scalar_induction which is plumbed into the relevant places in
statement analysis and code generation.

This makes the logic easier to follow as we no longer need to prepare statements
in different part of the code for use in vectorizable_live_operation.

The new code also supports all induction types.  As mentioned before the
different IVs are later harmonized by IV opts based on addressing mode costs and
so we get better codegen as well, particularly for SVE which has support for
complex addressing modes.

Lastly this also includes an easier to follow fix for a laten bug on trunk in
that we are not correctly handling the main exit IVs because we were unable to
distinguish between main exits that require the last element vs the
one-before-last element.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        PR tree-optimization/115120
        PR tree-optimization/119577
        PR tree-optimization/119860
        * tree-vect-loop-manip.cc (vect_do_peeling): Store niters_vector_mult_vf
        inside loop_vinfo.
        * tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Add some comments.
        (vect_update_nonlinear_iv): use `unsigned_type_for` such that function
        works for both vector and scalar types.
        (vect_build_plus_adjustment, dissolve_scalar_iv_phi_nodes,
        vectorizable_scalar_induction): New.
        (vectorizable_induction, vectorizable_live_operation): Remove early
        break IV handling code.
        * tree-vect-slp.cc (vect_analyze_slp): Create new scalar IV SLP nodes.
        (vect_slp_analyze_node_operations, vect_slp_analyze_operations,
        vect_schedule_slp_node, vect_schedule_scc): Support scalar IV nodes and
        instances.
        * tree-vect-stmts.cc (vect_stmt_relevant_p): Don't force early break IVs
        live but instead mark them.
        (can_vectorize_live_stmts): Remove analysis of forced live nodes as they
        no longer exist.
        (vect_analyze_stmt, vect_transform_stmt): Support scalar inductions.
        * tree-vectorizer.h (enum stmt_vec_info_type): Add scalar_iv_info_type.
        (enum slp_instance_kind): Add slp_inst_kind_scalar_iv.
        (class _loop_vec_info): Add niters_vector_mult_vf.
        (LOOP_VINFO_VECTOR_NITERS_VF, vectorizable_scalar_induction): New.

gcc/testsuite/ChangeLog:

        PR tree-optimization/115120
        PR tree-optimization/119577
        PR tree-optimization/119860
        * gcc.dg/vect/vect-early-break_39.c: Update.
        * gcc.target/aarch64/sve/peel_ind_9.c: Update.

---
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
index 
b3f40b8c9ba49e41bd283e46a462238c3b5825ef..bc862ad20e68db8f3c0ba6facf47e13a56a7cd6d
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -23,5 +23,6 @@ unsigned test4(unsigned x, unsigned n)
  return ret;
 }
 
-/* cannot safely vectorize this due due to the group misalignment.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" 
} } */
+/* AArch64 will scalarize the load and is able to vectorize it.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" 
{ target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c 
b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
index 
cc904e88170f072e1d3c6be86643d99a7cd5cb12..14c7ea07b28e16dddebe4dc3743f90b32723c324
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
@@ -20,6 +20,7 @@ foo (void)
 }
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
-/* Peels using a scalar loop.  */
-/* { dg-final { scan-tree-dump-not "pfa_iv_offset" "vect" } } */
+/* Peels using fully masked loop.  */
+/* { dg-final { scan-tree-dump "pfa_iv_offset" "vect" } } */
+/* { dg-final { scan-tree-dump "misalignment for fully-masked loop" "vect" } } 
*/
 /* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
"vect" } } */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
96ca273c24680556f16cdc9e465f490d7fcdb8a4..345d09d46185cb9e85e0b0d80ddff784a2802837
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3578,6 +3578,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
       else
        vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector,
                                             &niters_vector_mult_vf);
+
+      /* Store niters_vector_mult_vf for later use.  */
+      LOOP_VINFO_VECTOR_NITERS_VF (loop_vinfo) = niters_vector_mult_vf;
+
       /* Update IVs of original loop as if they were advanced by
         niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
9320bf8e878d22faa5e202311649ffc05dbd6094..003c801c0193365f29fda0ffafd5b06c1ea8ee29
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2596,6 +2596,9 @@ again:
       if (SLP_TREE_DEF_TYPE (SLP_INSTANCE_TREE (instance)) != 
vect_internal_def)
        continue;
 
+      if (SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance)).is_empty ())
+       continue;
+
       stmt_vec_info vinfo;
       vinfo = SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance))[0];
       if (! STMT_VINFO_GROUPED_ACCESS (vinfo))
@@ -8945,6 +8948,11 @@ vect_peel_nonlinear_iv_init (gimple_seq* stmts, tree 
init_expr,
   unsigned prec = TYPE_PRECISION (type);
   switch (induction_type)
     {
+    /* neg inductions are typically not used for loop termination conditions 
but
+       are typically implemented as b = -b.  That is every scalar iteration b 
is
+       negated.  That means that for the initial value of b we will have to
+       determine whether the number of skipped iteration is a multiple of 2
+       because every 2 scalar iterations we are back at "b".  */
     case vect_step_op_neg:
       if (TREE_INT_CST_LOW (skip_niters) % 2)
        init_expr = gimple_build (stmts, NEGATE_EXPR, type, init_expr);
@@ -9060,9 +9068,7 @@ vect_update_nonlinear_iv (gimple_seq* stmts, tree vectype,
     case vect_step_op_mul:
       {
        /* Use unsigned mult to avoid UD integer overflow.  */
-       tree uvectype
-         = build_vector_type (unsigned_type_for (TREE_TYPE (vectype)),
-                              TYPE_VECTOR_SUBPARTS (vectype));
+       tree uvectype = unsigned_type_for (vectype);
        vec_def = gimple_convert (stmts, uvectype, vec_def);
        vec_step = gimple_convert (stmts, uvectype, vec_step);
        vec_def = gimple_build (stmts, MULT_EXPR, uvectype,
@@ -9404,6 +9410,423 @@ vectorizable_nonlinear_induction (loop_vec_info 
loop_vinfo,
   return true;
 }
 
+/* Create an adjustment from BASE to BASE + OFFSET with type TYPE.
+   if BASE and oFFSET are not the same type, emit conversions into STMTS.  If
+   BASE is a POINTER_TYPE_P then use a POINTER_PLUS_EXPR instead of PLUS_EXPR
+   and convert OFFSET to the appropriate type.  */
+static tree
+vect_build_plus_adjustment (gimple_seq *stmts, tree type, tree base,
+                           tree offset)
+{
+  if (POINTER_TYPE_P (type))
+    {
+      offset = gimple_convert (stmts, sizetype, offset);
+      return gimple_build (stmts, POINTER_PLUS_EXPR, type, base,
+                          gimple_convert (stmts, sizetype, offset));
+    }
+  else
+    {
+      offset = gimple_convert (stmts, type, offset);
+      return gimple_build (stmts, PLUS_EXPR, type, base, offset);
+    }
+}
+
+/* This function is only useful for updating PHI nodes wrt to early break
+   blocks.  This functions updates blocks such as
+
+   BB x:
+     y = PHI<DEF, DEF, ...>
+
+   into
+     y = NEW_IV
+
+   or
+     y = RECOMP_IV
+
+   depending on whether the value occurs in a block where we expect the value 
of
+   the current scalar iteration or the previous one.  If we need the value of
+   RECOMP_IV then E_STMTS are first emitted in order to create the values.
+   Otherwise we elide it to keep the entries in the emitted BB cleaner.  */
+void static
+dissolve_scalar_iv_phi_nodes (loop_vec_info loop_vinfo, tree def,
+                             tree new_iv, tree recomp_iv, gimple_seq &e_stmts)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  imm_use_iterator imm_iter;
+  gimple *use_stmt;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
+    if (!is_gimple_debug (use_stmt)
+       && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+           {
+             edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
+                                           phi_arg_index_from_use (use_p));
+             gcc_assert (loop_exit_edge_p (loop, e));
+             auto exit_gsi = gsi_last_nondebug_bb (e->dest);
+             auto stmt = gsi_stmt (exit_gsi);
+             /* We need to insert at the end, but can't do so across the
+                jump.  */
+             if (stmt && !is_a <gcond *>(stmt))
+               gsi_next (&exit_gsi);
+             tree lhs_phi = gimple_phi_result (use_stmt);
+             auto gsi = gsi_for_stmt (use_stmt);
+             remove_phi_node (&gsi, false);
+             tree iv_var = new_iv;
+             if (recomp_iv
+                 && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)
+                 && LOOP_VINFO_IV_EXIT (loop_vinfo) == e)
+               {
+                 /* Emit any extra statement that may be needed to use
+                    recomp_iv.  */
+                 if (e_stmts)
+                   gsi_insert_seq_before (&exit_gsi, e_stmts, GSI_SAME_STMT);
+                 iv_var = recomp_iv;
+                 e_stmts = NULL;
+               }
+             gimple *copy = gimple_build_assign (lhs_phi, iv_var);
+             gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+             break;
+           }
+}
+
+/* Function vectorizable_scalar_induction
+
+   Check if STMT_INFO performs an scalar induction computation that can be
+   used by early break vectorization where we need to know the starting value 
of
+   the IV. If VEC_STMT is also passed, vectorize the induction PHI: create
+   a "vectorized" scalar phi to replace it, put it in VEC_STMT, and add it to
+   the same basic block.
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+bool
+vectorizable_scalar_induction (loop_vec_info loop_vinfo,
+                              stmt_vec_info stmt_info,
+                              slp_tree slp_node,
+                              stmt_vector_for_cost *cost_vec)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *iv_loop;
+  tree vec_def;
+  edge pe = loop_preheader_edge (loop);
+  tree vec_init;
+  gphi *induction_phi;
+  tree induc_def, vec_dest;
+  tree init_expr, step_expr, iv_step;
+  tree niters_skip = NULL_TREE;
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  gimple_stmt_iterator si;
+
+  gphi *phi = dyn_cast <gphi *> (stmt_info->stmt);
+
+  enum vect_induction_op_type induction_type
+    = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info);
+
+  /* FORNOW. Only handle nonlinear induction in the same loop.  */
+  if (nested_in_vect_loop_p (loop, stmt_info)
+      && induction_type != vect_step_op_add)
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "nonlinear induction in nested loop.\n");
+      return false;
+    }
+
+  iv_loop = loop;
+  gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
+
+  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+  init_expr = vect_phi_initial_value (phi);
+  gcc_assert (init_expr != NULL);
+
+  /* A scalar IV with no step means it doesn't evolve.  Just
+     set it to 0.  This makes follow up adjustments easier as 0 just folds them
+     away.  */
+  if (!step_expr)
+    build_zero_cst (TREE_TYPE (init_expr));
+
+  if (cost_vec) /* transformation not required.  */
+    {
+      switch (induction_type)
+       {
+         case vect_step_op_add:
+         case vect_step_op_mul:
+         case vect_step_op_shl:
+         case vect_step_op_shr:
+         case vect_step_op_neg:
+           break;
+         default:
+           {
+             if (dump_enabled_p ())
+               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "Unsupported scalar induction type for early break.");
+             return false;
+           }
+       }
+
+      /* We don't perform any costing here because it's impossible to tell the
+        sequence of instructions needed for the diffirent induction types.  In
+        addition the expectation is that IVopts will unify the IVs so the final
+        cost isn't known here yet.  Lastly most of the cost models will
+        interpret scalar instructions during vect_body as vector statements and
+        as such the cost of the loop becomes quite unrealistic.   */
+
+      SLP_TREE_TYPE (slp_node) = scalar_iv_info_type;
+      DUMP_VECT_SCOPE ("vectorizable_scalar_induction");
+      return true;
+    }
+
+  /* Transform.  */
+
+  /* Compute a scalar variable that represents the number of scalar iterations
+     the vector code has performed at the end of the relevant exit.  For early
+     exits we transform to the value at the start of the last vector iteration.
+     For non-early exit the value depends on whether the main exit is
+     calculating the i + i or i. i.e. the last value, or the value after last.
+     This is determined by which LCSSA variable is found in the latch exit.  */
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform scalar induction 
phi.\n");
+
+  pe = loop_preheader_edge (iv_loop);
+  /* Find the first insertion point in the BB.  */
+  basic_block bb = gimple_bb (phi);
+  si = gsi_after_labels (bb);
+
+  gimple_seq stmts = NULL;
+  gimple_seq init_stmts = NULL;
+  gimple_seq iv_stmts = NULL;
+
+  niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
+  tree ty_niters_skip = niters_skip ? TREE_TYPE (niters_skip) : NULL_TREE;
+
+  /* Create the induction-phi that defines the induction-operand.  */
+  tree scalar_type = TREE_TYPE (PHI_RESULT (phi));
+  vec_dest = vect_get_new_vect_var (scalar_type, vect_scalar_var, "scal_iv_");
+  induction_phi = create_phi_node (vec_dest, iv_loop->header);
+  induc_def = PHI_RESULT (induction_phi);
+
+  /* Create the iv update inside the loop.  */
+  stmts = NULL;
+  tree tree_vf = build_int_cst (scalar_type, vf);
+  if (SCALAR_FLOAT_TYPE_P (scalar_type))
+    tree_vf = gimple_convert (&init_stmts, scalar_type, tree_vf);
+
+  /* For loop len targets we have to use .SELECT_VL (ivtmp_33, VF); instead of
+     just += VF as the VF can change in between two loop iterations.  */
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+    {
+      vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+      tree_vf = vect_get_loop_len (loop_vinfo, NULL, lens, 1,
+                                  NULL_TREE, 0, 0);
+    }
+
+  /* Create the following def-use cycle:
+     loop prolog:
+     scalar_init = ...
+     scalar_step = ...
+     loop:
+     scalar_iv = PHI <scalar_init, vec_loop>
+     ...
+     STMT
+     ...
+     vec_loop = scalar_iv + scalar_step;  */
+  switch (induction_type)
+  {
+    case vect_step_op_add:
+      {
+       if (niters_skip)
+         vec_init
+           = vect_build_plus_adjustment (&init_stmts, scalar_type, init_expr,
+                       gimple_convert (&init_stmts, scalar_type,
+                         gimple_build (&init_stmts, MINUS_EXPR, ty_niters_skip,
+                                       build_zero_cst (ty_niters_skip),
+                                       niters_skip)));
+       else
+         vec_init = init_expr;
+
+       /* Lets do step * VF as the induction step to get a chance to CSE it.  
*/
+       vec_def
+           = vect_build_plus_adjustment (&stmts, scalar_type, induc_def,
+               gimple_build (&stmts, MULT_EXPR, scalar_type, step_expr,
+                             tree_vf));
+       break;
+      }
+    case vect_step_op_mul:
+    case vect_step_op_shl:
+    case vect_step_op_shr:
+    case vect_step_op_neg:
+      {
+       if (niters_skip)
+         vec_init = vect_peel_nonlinear_iv_init (&init_stmts, init_expr,
+                                                 niters_skip, step_expr,
+                                                 induction_type);
+       else
+         vec_init = init_expr;
+
+       iv_step = vect_create_nonlinear_iv_step (&init_stmts, init_expr, vf,
+                                                induction_type);
+       vec_def = vect_update_nonlinear_iv (&stmts, scalar_type, induc_def,
+                                           iv_step, induction_type);
+       break;
+      }
+    default:
+      gcc_unreachable ();
+  }
+
+  /* If early break then we have to create a new PHI which we can use as
+     an offset to adjust the induction reduction in early exits.
+
+     This is because when peeling for alignment using masking, the first
+     few elements of the vector can be inactive.  As such if we find the
+     entry in the first iteration we have adjust the starting point of
+     the scalar code.
+
+     We do this by creating a new scalar PHI that keeps track of whether
+     we are the first iteration of the loop (with the additional masking)
+     or whether we have taken a loop iteration already.
+
+    The generated sequence:
+
+    pre-header:
+       bb1:
+         i_1 = <number of leading inactive elements>
+
+       header:
+       bb2:
+         i_2 = PHI <i_1(bb1), 0(latch)>
+         …
+
+       early-exit:
+       bb3:
+         i_3 = iv_step * i_2 + PHI<vector-iv>
+
+     The first part of the adjustment to create i_1 and i_2 are done here
+     and the last part creating i_3 is done in
+     vectorizable_live_operations when the induction extraction is
+     materialized.  */
+  if (niters_skip
+      && !LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
+    {
+      tree ty_skip_niters = TREE_TYPE (niters_skip);
+      tree break_lhs_phi
+       = vect_get_new_vect_var (ty_skip_niters, vect_scalar_var,
+                                "pfa_iv_offset");
+      gphi *nphi = create_phi_node (break_lhs_phi, bb);
+      add_phi_arg (nphi, niters_skip, pe, UNKNOWN_LOCATION);
+      add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
+                      loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
+
+      LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo) = PHI_RESULT (nphi);
+    }
+
+  /* Write the init_stmts in the loop-preheader block.  */
+  auto psi = gsi_last_nondebug_bb (pe->src);
+  gsi_insert_seq_after (&psi, init_stmts, GSI_LAST_NEW_STMT);
+  /* Wite the adjustments in the loop header block.  */
+  gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT);
+  tree induc_step_def
+    = gimple_phi_arg_def_from_edge (phi, loop_latch_edge (iv_loop));
+
+  /* Set the arguments of the phi node:  */
+  add_phi_arg (induction_phi, vec_init, pe, UNKNOWN_LOCATION);
+  add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
+              UNKNOWN_LOCATION);
+
+  /* If we've done any peeling, calculate the peeling adjustment needed to the
+     final IV.  */
+  if (LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
+    {
+      tree step_expr
+       = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+      tree break_lhs_phi
+       = LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo);
+      tree ty_skip_niters = TREE_TYPE (break_lhs_phi);
+
+      /* Now create the PHI for the outside loop usage to
+        retrieve the value for the offset counter.  */
+      tree rphi_step
+       = gimple_convert (&iv_stmts, ty_skip_niters, step_expr);
+      tree tmp2
+       = gimple_build (&iv_stmts, MULT_EXPR,
+                       ty_skip_niters, rphi_step,
+                       break_lhs_phi);
+
+      induc_def = vect_build_plus_adjustment (&iv_stmts, TREE_TYPE (induc_def),
+                                             induc_def, tmp2);
+
+    basic_block exit_bb = NULL;
+    /* Identify the early exit merge block.  I wish we had stored this.  */
+    for (auto e : get_loop_exit_edges (iv_loop))
+      if (e != LOOP_VINFO_IV_EXIT (loop_vinfo))
+       exit_bb = e->dest;
+
+    gcc_assert (exit_bb);
+    auto exit_gsi = gsi_after_labels (exit_bb);
+    gsi_insert_seq_before (&exit_gsi, iv_stmts, GSI_SAME_STMT);
+  }
+
+  tree indec_vec_def = vec_def;
+  tree recomp_induc_def = NULL_TREE;
+  gimple_seq e_stmts = NULL;
+  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+    indec_vec_def = induc_def;
+  else
+    {
+      /* When doing early-break we have to account for the situation where the
+        loop structure is essentially:
+
+        x_1 = PHI<x, y>
+        ...
+        x_2 = x_1 + step
+
+       and the value returned in the latch exit is x_1 instead fo x_2.   This
+       happens a lot with Fortran because it's arrays aren't 0 based.  We will
+       generate the statements, but only emit them if they are needed.
+
+       We use niters_vector_mult_vf because helpers like
+       vect_gen_vector_loop_niters_mult_vf have already calculated the correct
+       number of vector iterations in this scenario which makes the adjustment
+       easier.  If the starting index ends up being 0 then this is all folded
+       away.  */
+      tree niters_vf
+       = gimple_convert (&e_stmts, scalar_type,
+                         LOOP_VINFO_VECTOR_NITERS_VF (loop_vinfo));
+      tree step = gimple_convert (&e_stmts, scalar_type, step_expr);
+
+      /* n_minus_1 = max(niters_vf - 1, 0) to be safe when niters_vf == 0.  */
+      tree n_minus_1 = gimple_build (&e_stmts, MINUS_EXPR, scalar_type,
+                                    niters_vf, build_one_cst (scalar_type));
+
+      /* delta = (niters_vf - 1) * step.   */
+      tree delta = gimple_build (&e_stmts, MULT_EXPR, scalar_type, n_minus_1,
+                                step);
+
+      /* j_exit = init_expr + (niters_vf - 1) * step.  */
+      recomp_induc_def = vect_build_plus_adjustment (&e_stmts, scalar_type,
+                                                    init_expr, delta);
+    }
+
+  /* We have to dissolve the PHI back to an assignment since PHIs are always
+     at the start of the block.  This is safe due to all early exits being
+     pushed to the same block.  As such the PHI elements are all the same.  */
+  dissolve_scalar_iv_phi_nodes (loop_vinfo, PHI_RESULT (phi), induc_def,
+                               recomp_induc_def, e_stmts);
+
+  /* Rewrite any usage of the latch iteration PHI if present.  */
+  dissolve_scalar_iv_phi_nodes (loop_vinfo, induc_step_def, indec_vec_def,
+                               NULL_TREE, e_stmts);
+
+  slp_node->push_vec_def (induction_phi);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+                    "transform scalar induction: created def-use cycle: %G%T",
+                    (gimple *) induction_phi, vec_def);
+  return true;
+}
+
 /* Function vectorizable_induction
 
    Check if STMT_INFO performs an induction computation that can be vectorized.
@@ -9690,53 +10113,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
                                   LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo));
       peel_mul = gimple_build_vector_from_val (&init_stmts,
                                               step_vectype, peel_mul);
-
-      /* If early break then we have to create a new PHI which we can use as
-        an offset to adjust the induction reduction in early exits.
-
-        This is because when peeling for alignment using masking, the first
-        few elements of the vector can be inactive.  As such if we find the
-        entry in the first iteration we have adjust the starting point of
-        the scalar code.
-
-        We do this by creating a new scalar PHI that keeps track of whether
-        we are the first iteration of the loop (with the additional masking)
-        or whether we have taken a loop iteration already.
-
-        The generated sequence:
-
-        pre-header:
-          bb1:
-            i_1 = <number of leading inactive elements>
-
-          header:
-          bb2:
-            i_2 = PHI <i_1(bb1), 0(latch)>
-            …
-
-          early-exit:
-          bb3:
-            i_3 = iv_step * i_2 + PHI<vector-iv>
-
-        The first part of the adjustment to create i_1 and i_2 are done here
-        and the last part creating i_3 is done in
-        vectorizable_live_operations when the induction extraction is
-        materialized.  */
-      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-         && !LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
-       {
-         auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
-         tree ty_skip_niters = TREE_TYPE (skip_niters);
-         tree break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
-                                                     vect_scalar_var,
-                                                     "pfa_iv_offset");
-         gphi *nphi = create_phi_node (break_lhs_phi, bb);
-         add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
-         add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
-                      loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
-
-         LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo) = PHI_RESULT (nphi);
-       }
     }
   tree step_mul = NULL_TREE;
   unsigned ivn;
@@ -10312,8 +10688,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
                 to the latch then we're restarting the iteration in the
                 scalar loop.  So get the first live value.  */
              bool early_break_first_element_p
-               = (all_exits_as_early_p || !main_exit_edge)
-                  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def;
+               = all_exits_as_early_p || !main_exit_edge;
              if (early_break_first_element_p)
                {
                  tmp_vec_lhs = vec_lhs0;
@@ -10322,52 +10697,13 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
              gimple_stmt_iterator exit_gsi;
              tree new_tree
-               = vectorizable_live_operation_1 (loop_vinfo,
-                                                e->dest, vectype,
-                                                slp_node, bitsize,
-                                                tmp_bitstart, tmp_vec_lhs,
-                                                lhs_type, &exit_gsi);
+                 = vectorizable_live_operation_1 (loop_vinfo,
+                                                  e->dest, vectype,
+                                                  slp_node, bitsize,
+                                                  tmp_bitstart, tmp_vec_lhs,
+                                                  lhs_type, &exit_gsi);
 
              auto gsi = gsi_for_stmt (use_stmt);
-             if (early_break_first_element_p
-                 && LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
-               {
-                 tree step_expr
-                   = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
-                 tree break_lhs_phi
-                   = LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo);
-                 tree ty_skip_niters = TREE_TYPE (break_lhs_phi);
-                 gimple_seq iv_stmts = NULL;
-
-                 /* Now create the PHI for the outside loop usage to
-                    retrieve the value for the offset counter.  */
-                 tree rphi_step
-                   = gimple_convert (&iv_stmts, ty_skip_niters, step_expr);
-                 tree tmp2
-                   = gimple_build (&iv_stmts, MULT_EXPR,
-                                   ty_skip_niters, rphi_step,
-                                   break_lhs_phi);
-
-                 if (POINTER_TYPE_P (TREE_TYPE (new_tree)))
-                   {
-                     tmp2 = gimple_convert (&iv_stmts, sizetype, tmp2);
-                     tmp2 = gimple_build (&iv_stmts, POINTER_PLUS_EXPR,
-                                          TREE_TYPE (new_tree), new_tree,
-                                          tmp2);
-                   }
-                 else
-                   {
-                     tmp2 = gimple_convert (&iv_stmts, TREE_TYPE (new_tree),
-                                            tmp2);
-                     tmp2 = gimple_build (&iv_stmts, PLUS_EXPR,
-                                          TREE_TYPE (new_tree), new_tree,
-                                          tmp2);
-                   }
-
-                 new_tree = tmp2;
-                 gsi_insert_seq_before (&exit_gsi, iv_stmts, GSI_SAME_STMT);
-               }
-
              tree lhs_phi = gimple_phi_result (use_stmt);
              remove_phi_node (&gsi, false);
              gimple *copy = gimple_build_assign (lhs_phi, new_tree);
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 
9698709f5671971c35a50a16a258874beb44514a..e050f34d2578ed9168ff30ec02ca746c132a08fe
 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5620,6 +5620,44 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size,
              }
          }
 
+      /* Find and create slp instances for inductions that have been forced
+        live due to early break.  */
+      edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
+      for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo))
+         {
+           gphi *phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
+           tree def = gimple_phi_arg_def_from_edge (phi, latch_e);
+
+           slp_tree node = vect_create_new_slp_node (vNULL);
+           SLP_TREE_VECTYPE (node) = NULL_TREE;
+           SLP_TREE_LANES (node) = 1;
+           SLP_TREE_DEF_TYPE (node) = vect_internal_def;
+           SLP_TREE_VEC_DEFS (node).safe_push (def);
+           SLP_TREE_REPRESENTATIVE (node) = stmt_info;
+
+           /* Create a new SLP instance.  */
+           slp_instance new_instance = XNEW (class _slp_instance);
+           SLP_INSTANCE_TREE (new_instance) = node;
+           SLP_INSTANCE_LOADS (new_instance) = vNULL;
+           SLP_INSTANCE_ROOT_STMTS (new_instance) = vNULL;
+           SLP_INSTANCE_REMAIN_DEFS (new_instance) = vNULL;
+           SLP_INSTANCE_KIND (new_instance) = slp_inst_kind_scalar_iv;
+           new_instance->reduc_phis = NULL;
+           new_instance->cost_vec = vNULL;
+           new_instance->subgraph_entries = vNULL;
+
+           vinfo->slp_instances.safe_push (new_instance);
+
+           if (dump_enabled_p ())
+             {
+               dump_printf_loc (MSG_NOTE, vect_location,
+                                "Final scalar def SLP tree for instance %p:\n",
+                                (void *) new_instance);
+               vect_print_slp_graph (MSG_NOTE, vect_location,
+                                     SLP_INSTANCE_TREE (new_instance));
+             }
+         }
+
       /* Find SLP sequences starting from gconds.  */
       for (auto cond : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
        {
@@ -5664,48 +5702,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size,
                                             "SLP build failed.\n");
            }
        }
-
-       /* Find and create slp instances for inductions that have been forced
-          live due to early break.  */
-       edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
-       for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo))
-         {
-           vec<stmt_vec_info> stmts;
-           vec<stmt_vec_info> roots = vNULL;
-           vec<tree> remain = vNULL;
-           gphi *phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
-           tree def = gimple_phi_arg_def_from_edge (phi, latch_e);
-           stmt_vec_info lc_info = loop_vinfo->lookup_def (def);
-           if (lc_info)
-             {
-               stmts.create (1);
-               stmts.quick_push (vect_stmt_to_vectorize (lc_info));
-               if (! vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
-                                              stmts, roots, remain,
-                                              max_tree_size, &limit,
-                                              bst_map, force_single_lane))
-                 return opt_result::failure_at (vect_location,
-                                                "SLP build failed.\n");
-             }
-           /* When the latch def is from a different cycle this can only
-              be a induction.  Build a simple instance for this.
-              ???  We should be able to start discovery from the PHI
-              for all inductions, but then there will be stray
-              non-SLP stmts we choke on as needing non-SLP handling.  */
-           auto_vec<stmt_vec_info, 1> tem;
-           tem.quick_push (stmt_info);
-           if (!bst_map->get (tem))
-             {
-               stmts.create (1);
-               stmts.quick_push (stmt_info);
-               if (! vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
-                                              stmts, roots, remain,
-                                              max_tree_size, &limit,
-                                              bst_map, force_single_lane))
-                 return opt_result::failure_at (vect_location,
-                                                "SLP build failed.\n");
-             }
-         }
     }
 
   hash_set<slp_tree> visited_patterns;
@@ -8542,6 +8538,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
      insertion place.  */
   if (res
       && !seen_non_constant_child
+      && SLP_INSTANCE_KIND (node_instance) != slp_inst_kind_scalar_iv
       && SLP_TREE_SCALAR_STMTS (node).is_empty ())
     {
       if (dump_enabled_p ())
@@ -8986,8 +8983,10 @@ vect_slp_analyze_operations (vec_info *vinfo)
          stmt_vec_info stmt_info;
          if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
            stmt_info = SLP_INSTANCE_ROOT_STMTS (instance)[0];
-         else
+         else if (!SLP_TREE_SCALAR_STMTS (node).is_empty ())
            stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
+         else
+           stmt_info = SLP_TREE_REPRESENTATIVE (node);
          if (is_a <loop_vec_info> (vinfo))
            {
              if (dump_enabled_p ())
@@ -11617,7 +11616,8 @@ vect_schedule_slp_node (vec_info *vinfo,
 
   stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
 
-  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ());
+  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ()
+             || SLP_INSTANCE_KIND (instance) == slp_inst_kind_scalar_iv);
   if (SLP_TREE_VECTYPE (node))
     SLP_TREE_VEC_DEFS (node).create (vect_get_num_copies (vinfo, node));
 
@@ -11636,7 +11636,8 @@ vect_schedule_slp_node (vec_info *vinfo,
   else if (!SLP_TREE_PERMUTE_P (node)
           && (SLP_TREE_TYPE (node) == cycle_phi_info_type
               || SLP_TREE_TYPE (node) == induc_vec_info_type
-              || SLP_TREE_TYPE (node) == phi_info_type))
+              || SLP_TREE_TYPE (node) == phi_info_type
+              || SLP_TREE_TYPE (node) == scalar_iv_info_type))
     {
       /* For PHI node vectorization we do not use the insertion iterator.  */
       si = gsi_none ();
@@ -12024,7 +12025,8 @@ vect_schedule_scc (vec_info *vinfo, slp_tree node, 
slp_instance instance,
   maxdfs++;
 
   /* Leaf.  */
-  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def
+      || SLP_INSTANCE_KIND (instance) == slp_inst_kind_scalar_iv)
     {
       info->on_stack = false;
       vect_schedule_slp_node (vinfo, node, instance);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
83acbb3ff67ccdd4a39606850a23f483d6a4b1fb..5bddd7f37da4ea1048998b2c82ed464aa10a6730
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -435,7 +435,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
                         "vec_stmt_relevant_p: PHI forced live for "
                         "early break.\n");
       LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo).safe_push (stmt_info);
-      *live_p = true;
+      return true;
     }
 
   if (*live_p && *relevant == vect_unused_in_scope
@@ -12750,17 +12750,12 @@ can_vectorize_live_stmts (vec_info *vinfo,
                          bool vec_stmt_p,
                          stmt_vector_for_cost *cost_vec)
 {
-  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   stmt_vec_info slp_stmt_info;
   unsigned int i;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
     {
       if (slp_stmt_info
-         && (STMT_VINFO_LIVE_P (slp_stmt_info)
-             || (loop_vinfo
-                 && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-                 && STMT_VINFO_DEF_TYPE (slp_stmt_info)
-                 == vect_induction_def))
+         && STMT_VINFO_LIVE_P (slp_stmt_info)
          && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
                                           slp_node_instance, i,
                                           vec_stmt_p, cost_vec))
@@ -12796,6 +12791,21 @@ vect_analyze_stmt (vec_info *vinfo,
                                     stmt_info->stmt);
     }
 
+  /* Check if it's a scalar IV that we can codegen it.  Scalar IVs aren't 
forced
+     live as we don't want the vectorizer to analyze it as we don't set e.g.
+     vectype and we don't want it to be used in determining VF.  */
+  if (SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_scalar_iv
+      && is_a <loop_vec_info> (vinfo))
+    {
+      if (!vectorizable_scalar_induction (as_a <loop_vec_info> (vinfo),
+                                         stmt_info, node, cost_vec))
+       return opt_result::failure_at (stmt_info->stmt,
+                                      "not vectorized:"
+                                      " scalar IV not supported: %G",
+                                      stmt_info->stmt);
+      return opt_result::success ();
+    }
+
   /* Skip stmts that do not need to be vectorized.  */
   if (!STMT_VINFO_RELEVANT_P (stmt_info)
       && !STMT_VINFO_LIVE_P (stmt_info))
@@ -12852,6 +12862,7 @@ vect_analyze_stmt (vec_info *vinfo,
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (SLP_TREE_VECTYPE (node)
                  || gimple_code (stmt_info->stmt) == GIMPLE_COND
+                 || SLP_INSTANCE_KIND (node_instance) == 
slp_inst_kind_scalar_iv
                  || (call && gimple_call_lhs (call) == NULL_TREE));
     }
 
@@ -13031,6 +13042,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case scalar_iv_info_type:
+      done = vectorizable_scalar_induction (as_a <loop_vec_info> (vinfo),
+                                           stmt_info, slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     case permute_info_type:
       done = vectorizable_slp_permutation (vinfo, gsi, slp_node, NULL);
       gcc_assert (done);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 
905a29142d3eb8077ab9fb29b3cceb04834848fe..d0aabebccdf0e308999d39378ebd0c1b503b50f7
 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -243,6 +243,7 @@ enum stmt_vec_info_type {
   phi_info_type,
   recurr_info_type,
   loop_exit_ctrl_vec_info_type,
+  scalar_iv_info_type,
   permute_info_type
 };
 
@@ -392,7 +393,8 @@ enum slp_instance_kind {
     slp_inst_kind_reduc_chain,
     slp_inst_kind_bb_reduc,
     slp_inst_kind_ctor,
-    slp_inst_kind_gcond
+    slp_inst_kind_gcond,
+    slp_inst_kind_scalar_iv
 };
 
 /* SLP instance is a sequence of stmts in a loop that can be packed into
@@ -1236,6 +1238,10 @@ public:
      happen.  */
   auto_vec<gimple*> early_break_vuses;
 
+  /* The number of scalar iterations performed as vector in the case the loop
+     exits from the main exit block.  This can be an SSA name or a constant.  
*/
+  tree niters_vector_mult_vf;
+
   /* Record statements that are needed to be live for early break vectorization
      but may not have an LC PHI node materialized yet in the exits.  */
   auto_vec<stmt_vec_info> early_break_live_ivs;
@@ -1306,6 +1312,7 @@ public:
   (L)->early_break_live_ivs
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
+#define LOOP_VINFO_VECTOR_NITERS_VF(L)     (L)->niters_vector_mult_vf
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -2705,6 +2712,8 @@ extern bool vectorizable_recurr (loop_vec_info, 
stmt_vec_info,
 extern bool vectorizable_early_exit (loop_vec_info, stmt_vec_info,
                                     gimple_stmt_iterator *,
                                     slp_tree, stmt_vector_for_cost *);
+extern bool vectorizable_scalar_induction (loop_vec_info, stmt_vec_info,
+                                          slp_tree, stmt_vector_for_cost *);
 extern bool vect_emulated_vector_p (tree);
 extern bool vect_can_vectorize_without_simd_p (tree_code);
 extern bool vect_can_vectorize_without_simd_p (code_helper);


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
index b3f40b8c9ba49e41bd283e46a462238c3b5825ef..bc862ad20e68db8f3c0ba6facf47e13a56a7cd6d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -23,5 +23,6 @@ unsigned test4(unsigned x, unsigned n)
  return ret;
 }
 
-/* cannot safely vectorize this due due to the group misalignment.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" } } */
+/* AArch64 will scalarize the load and is able to vectorize it.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
index cc904e88170f072e1d3c6be86643d99a7cd5cb12..14c7ea07b28e16dddebe4dc3743f90b32723c324 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_9.c
@@ -20,6 +20,7 @@ foo (void)
 }
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
-/* Peels using a scalar loop.  */
-/* { dg-final { scan-tree-dump-not "pfa_iv_offset" "vect" } } */
+/* Peels using fully masked loop.  */
+/* { dg-final { scan-tree-dump "pfa_iv_offset" "vect" } } */
+/* { dg-final { scan-tree-dump "misalignment for fully-masked loop" "vect" } } */
 /* { dg-final { scan-tree-dump "Alignment of access forced using peeling" "vect" } } */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 96ca273c24680556f16cdc9e465f490d7fcdb8a4..345d09d46185cb9e85e0b0d80ddff784a2802837 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3578,6 +3578,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       else
 	vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector,
 					     &niters_vector_mult_vf);
+
+      /* Store niters_vector_mult_vf for later use.  */
+      LOOP_VINFO_VECTOR_NITERS_VF (loop_vinfo) = niters_vector_mult_vf;
+
       /* Update IVs of original loop as if they were advanced by
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9320bf8e878d22faa5e202311649ffc05dbd6094..003c801c0193365f29fda0ffafd5b06c1ea8ee29 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2596,6 +2596,9 @@ again:
       if (SLP_TREE_DEF_TYPE (SLP_INSTANCE_TREE (instance)) != vect_internal_def)
 	continue;
 
+      if (SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance)).is_empty ())
+	continue;
+
       stmt_vec_info vinfo;
       vinfo = SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance))[0];
       if (! STMT_VINFO_GROUPED_ACCESS (vinfo))
@@ -8945,6 +8948,11 @@ vect_peel_nonlinear_iv_init (gimple_seq* stmts, tree init_expr,
   unsigned prec = TYPE_PRECISION (type);
   switch (induction_type)
     {
+    /* neg inductions are typically not used for loop termination conditions but
+       are typically implemented as b = -b.  That is every scalar iteration b is
+       negated.  That means that for the initial value of b we will have to
+       determine whether the number of skipped iteration is a multiple of 2
+       because every 2 scalar iterations we are back at "b".  */
     case vect_step_op_neg:
       if (TREE_INT_CST_LOW (skip_niters) % 2)
 	init_expr = gimple_build (stmts, NEGATE_EXPR, type, init_expr);
@@ -9060,9 +9068,7 @@ vect_update_nonlinear_iv (gimple_seq* stmts, tree vectype,
     case vect_step_op_mul:
       {
 	/* Use unsigned mult to avoid UD integer overflow.  */
-	tree uvectype
-	  = build_vector_type (unsigned_type_for (TREE_TYPE (vectype)),
-			       TYPE_VECTOR_SUBPARTS (vectype));
+	tree uvectype = unsigned_type_for (vectype);
 	vec_def = gimple_convert (stmts, uvectype, vec_def);
 	vec_step = gimple_convert (stmts, uvectype, vec_step);
 	vec_def = gimple_build (stmts, MULT_EXPR, uvectype,
@@ -9404,6 +9410,423 @@ vectorizable_nonlinear_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Create an adjustment from BASE to BASE + OFFSET with type TYPE.
+   if BASE and oFFSET are not the same type, emit conversions into STMTS.  If
+   BASE is a POINTER_TYPE_P then use a POINTER_PLUS_EXPR instead of PLUS_EXPR
+   and convert OFFSET to the appropriate type.  */
+static tree
+vect_build_plus_adjustment (gimple_seq *stmts, tree type, tree base,
+			    tree offset)
+{
+  if (POINTER_TYPE_P (type))
+    {
+      offset = gimple_convert (stmts, sizetype, offset);
+      return gimple_build (stmts, POINTER_PLUS_EXPR, type, base,
+			   gimple_convert (stmts, sizetype, offset));
+    }
+  else
+    {
+      offset = gimple_convert (stmts, type, offset);
+      return gimple_build (stmts, PLUS_EXPR, type, base, offset);
+    }
+}
+
+/* This function is only useful for updating PHI nodes wrt to early break
+   blocks.  This functions updates blocks such as
+
+   BB x:
+     y = PHI<DEF, DEF, ...>
+
+   into
+     y = NEW_IV
+
+   or
+     y = RECOMP_IV
+
+   depending on whether the value occurs in a block where we expect the value of
+   the current scalar iteration or the previous one.  If we need the value of
+   RECOMP_IV then E_STMTS are first emitted in order to create the values.
+   Otherwise we elide it to keep the entries in the emitted BB cleaner.  */
+void static
+dissolve_scalar_iv_phi_nodes (loop_vec_info loop_vinfo, tree def,
+			      tree new_iv, tree recomp_iv, gimple_seq &e_stmts)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  imm_use_iterator imm_iter;
+  gimple *use_stmt;
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
+    if (!is_gimple_debug (use_stmt)
+	&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+	    {
+	      edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
+					    phi_arg_index_from_use (use_p));
+	      gcc_assert (loop_exit_edge_p (loop, e));
+	      auto exit_gsi = gsi_last_nondebug_bb (e->dest);
+	      auto stmt = gsi_stmt (exit_gsi);
+	      /* We need to insert at the end, but can't do so across the
+		 jump.  */
+	      if (stmt && !is_a <gcond *>(stmt))
+		gsi_next (&exit_gsi);
+	      tree lhs_phi = gimple_phi_result (use_stmt);
+	      auto gsi = gsi_for_stmt (use_stmt);
+	      remove_phi_node (&gsi, false);
+	      tree iv_var = new_iv;
+	      if (recomp_iv
+		  && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)
+		  && LOOP_VINFO_IV_EXIT (loop_vinfo) == e)
+		{
+		  /* Emit any extra statement that may be needed to use
+		     recomp_iv.  */
+		  if (e_stmts)
+		    gsi_insert_seq_before (&exit_gsi, e_stmts, GSI_SAME_STMT);
+		  iv_var = recomp_iv;
+		  e_stmts = NULL;
+		}
+	      gimple *copy = gimple_build_assign (lhs_phi, iv_var);
+	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+	      break;
+	    }
+}
+
+/* Function vectorizable_scalar_induction
+
+   Check if STMT_INFO performs an scalar induction computation that can be
+   used by early break vectorization where we need to know the starting value of
+   the IV. If VEC_STMT is also passed, vectorize the induction PHI: create
+   a "vectorized" scalar phi to replace it, put it in VEC_STMT, and add it to
+   the same basic block.
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+bool
+vectorizable_scalar_induction (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info,
+			       slp_tree slp_node,
+			       stmt_vector_for_cost *cost_vec)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *iv_loop;
+  tree vec_def;
+  edge pe = loop_preheader_edge (loop);
+  tree vec_init;
+  gphi *induction_phi;
+  tree induc_def, vec_dest;
+  tree init_expr, step_expr, iv_step;
+  tree niters_skip = NULL_TREE;
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  gimple_stmt_iterator si;
+
+  gphi *phi = dyn_cast <gphi *> (stmt_info->stmt);
+
+  enum vect_induction_op_type induction_type
+    = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info);
+
+  /* FORNOW. Only handle nonlinear induction in the same loop.  */
+  if (nested_in_vect_loop_p (loop, stmt_info)
+      && induction_type != vect_step_op_add)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "nonlinear induction in nested loop.\n");
+      return false;
+    }
+
+  iv_loop = loop;
+  gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
+
+  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+  init_expr = vect_phi_initial_value (phi);
+  gcc_assert (init_expr != NULL);
+
+  /* A scalar IV with no step means it doesn't evolve.  Just
+     set it to 0.  This makes follow up adjustments easier as 0 just folds them
+     away.  */
+  if (!step_expr)
+    build_zero_cst (TREE_TYPE (init_expr));
+
+  if (cost_vec) /* transformation not required.  */
+    {
+      switch (induction_type)
+	{
+	  case vect_step_op_add:
+	  case vect_step_op_mul:
+	  case vect_step_op_shl:
+	  case vect_step_op_shr:
+	  case vect_step_op_neg:
+	    break;
+	  default:
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "Unsupported scalar induction type for early break.");
+	      return false;
+	    }
+	}
+
+      /* We don't perform any costing here because it's impossible to tell the
+	 sequence of instructions needed for the diffirent induction types.  In
+	 addition the expectation is that IVopts will unify the IVs so the final
+	 cost isn't known here yet.  Lastly most of the cost models will
+	 interpret scalar instructions during vect_body as vector statements and
+	 as such the cost of the loop becomes quite unrealistic.   */
+
+      SLP_TREE_TYPE (slp_node) = scalar_iv_info_type;
+      DUMP_VECT_SCOPE ("vectorizable_scalar_induction");
+      return true;
+    }
+
+  /* Transform.  */
+
+  /* Compute a scalar variable that represents the number of scalar iterations
+     the vector code has performed at the end of the relevant exit.  For early
+     exits we transform to the value at the start of the last vector iteration.
+     For non-early exit the value depends on whether the main exit is
+     calculating the i + i or i. i.e. the last value, or the value after last.
+     This is determined by which LCSSA variable is found in the latch exit.  */
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform scalar induction phi.\n");
+
+  pe = loop_preheader_edge (iv_loop);
+  /* Find the first insertion point in the BB.  */
+  basic_block bb = gimple_bb (phi);
+  si = gsi_after_labels (bb);
+
+  gimple_seq stmts = NULL;
+  gimple_seq init_stmts = NULL;
+  gimple_seq iv_stmts = NULL;
+
+  niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
+  tree ty_niters_skip = niters_skip ? TREE_TYPE (niters_skip) : NULL_TREE;
+
+  /* Create the induction-phi that defines the induction-operand.  */
+  tree scalar_type = TREE_TYPE (PHI_RESULT (phi));
+  vec_dest = vect_get_new_vect_var (scalar_type, vect_scalar_var, "scal_iv_");
+  induction_phi = create_phi_node (vec_dest, iv_loop->header);
+  induc_def = PHI_RESULT (induction_phi);
+
+  /* Create the iv update inside the loop.  */
+  stmts = NULL;
+  tree tree_vf = build_int_cst (scalar_type, vf);
+  if (SCALAR_FLOAT_TYPE_P (scalar_type))
+    tree_vf = gimple_convert (&init_stmts, scalar_type, tree_vf);
+
+  /* For loop len targets we have to use .SELECT_VL (ivtmp_33, VF); instead of
+     just += VF as the VF can change in between two loop iterations.  */
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+    {
+      vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+      tree_vf = vect_get_loop_len (loop_vinfo, NULL, lens, 1,
+				   NULL_TREE, 0, 0);
+    }
+
+  /* Create the following def-use cycle:
+     loop prolog:
+     scalar_init = ...
+     scalar_step = ...
+     loop:
+     scalar_iv = PHI <scalar_init, vec_loop>
+     ...
+     STMT
+     ...
+     vec_loop = scalar_iv + scalar_step;  */
+  switch (induction_type)
+  {
+    case vect_step_op_add:
+      {
+	if (niters_skip)
+	  vec_init
+	    = vect_build_plus_adjustment (&init_stmts, scalar_type, init_expr,
+			gimple_convert (&init_stmts, scalar_type,
+			  gimple_build (&init_stmts, MINUS_EXPR, ty_niters_skip,
+					build_zero_cst (ty_niters_skip),
+					niters_skip)));
+	else
+	  vec_init = init_expr;
+
+	/* Lets do step * VF as the induction step to get a chance to CSE it.  */
+	vec_def
+	    = vect_build_plus_adjustment (&stmts, scalar_type, induc_def,
+		gimple_build (&stmts, MULT_EXPR, scalar_type, step_expr,
+			      tree_vf));
+	break;
+      }
+    case vect_step_op_mul:
+    case vect_step_op_shl:
+    case vect_step_op_shr:
+    case vect_step_op_neg:
+      {
+	if (niters_skip)
+	  vec_init = vect_peel_nonlinear_iv_init (&init_stmts, init_expr,
+						  niters_skip, step_expr,
+						  induction_type);
+	else
+	  vec_init = init_expr;
+
+	iv_step = vect_create_nonlinear_iv_step (&init_stmts, init_expr, vf,
+						 induction_type);
+	vec_def = vect_update_nonlinear_iv (&stmts, scalar_type, induc_def,
+					    iv_step, induction_type);
+	break;
+      }
+    default:
+      gcc_unreachable ();
+  }
+
+  /* If early break then we have to create a new PHI which we can use as
+     an offset to adjust the induction reduction in early exits.
+
+     This is because when peeling for alignment using masking, the first
+     few elements of the vector can be inactive.  As such if we find the
+     entry in the first iteration we have adjust the starting point of
+     the scalar code.
+
+     We do this by creating a new scalar PHI that keeps track of whether
+     we are the first iteration of the loop (with the additional masking)
+     or whether we have taken a loop iteration already.
+
+    The generated sequence:
+
+    pre-header:
+	bb1:
+	  i_1 = <number of leading inactive elements>
+
+	header:
+	bb2:
+	  i_2 = PHI <i_1(bb1), 0(latch)>
+	  …
+
+	early-exit:
+	bb3:
+	  i_3 = iv_step * i_2 + PHI<vector-iv>
+
+     The first part of the adjustment to create i_1 and i_2 are done here
+     and the last part creating i_3 is done in
+     vectorizable_live_operations when the induction extraction is
+     materialized.  */
+  if (niters_skip
+      && !LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
+    {
+      tree ty_skip_niters = TREE_TYPE (niters_skip);
+      tree break_lhs_phi
+	= vect_get_new_vect_var (ty_skip_niters, vect_scalar_var,
+				 "pfa_iv_offset");
+      gphi *nphi = create_phi_node (break_lhs_phi, bb);
+      add_phi_arg (nphi, niters_skip, pe, UNKNOWN_LOCATION);
+      add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
+		       loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
+
+      LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo) = PHI_RESULT (nphi);
+    }
+
+  /* Write the init_stmts in the loop-preheader block.  */
+  auto psi = gsi_last_nondebug_bb (pe->src);
+  gsi_insert_seq_after (&psi, init_stmts, GSI_LAST_NEW_STMT);
+  /* Wite the adjustments in the loop header block.  */
+  gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT);
+  tree induc_step_def
+    = gimple_phi_arg_def_from_edge (phi, loop_latch_edge (iv_loop));
+
+  /* Set the arguments of the phi node:  */
+  add_phi_arg (induction_phi, vec_init, pe, UNKNOWN_LOCATION);
+  add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop),
+	       UNKNOWN_LOCATION);
+
+  /* If we've done any peeling, calculate the peeling adjustment needed to the
+     final IV.  */
+  if (LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
+    {
+      tree step_expr
+	= STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+      tree break_lhs_phi
+	= LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo);
+      tree ty_skip_niters = TREE_TYPE (break_lhs_phi);
+
+      /* Now create the PHI for the outside loop usage to
+	 retrieve the value for the offset counter.  */
+      tree rphi_step
+	= gimple_convert (&iv_stmts, ty_skip_niters, step_expr);
+      tree tmp2
+	= gimple_build (&iv_stmts, MULT_EXPR,
+			ty_skip_niters, rphi_step,
+			break_lhs_phi);
+
+      induc_def = vect_build_plus_adjustment (&iv_stmts, TREE_TYPE (induc_def),
+					      induc_def, tmp2);
+
+    basic_block exit_bb = NULL;
+    /* Identify the early exit merge block.  I wish we had stored this.  */
+    for (auto e : get_loop_exit_edges (iv_loop))
+      if (e != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	exit_bb = e->dest;
+
+    gcc_assert (exit_bb);
+    auto exit_gsi = gsi_after_labels (exit_bb);
+    gsi_insert_seq_before (&exit_gsi, iv_stmts, GSI_SAME_STMT);
+  }
+
+  tree indec_vec_def = vec_def;
+  tree recomp_induc_def = NULL_TREE;
+  gimple_seq e_stmts = NULL;
+  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+    indec_vec_def = induc_def;
+  else
+    {
+      /* When doing early-break we have to account for the situation where the
+	 loop structure is essentially:
+
+	 x_1 = PHI<x, y>
+	 ...
+	 x_2 = x_1 + step
+
+	and the value returned in the latch exit is x_1 instead fo x_2.   This
+	happens a lot with Fortran because it's arrays aren't 0 based.  We will
+	generate the statements, but only emit them if they are needed.
+
+	We use niters_vector_mult_vf because helpers like
+	vect_gen_vector_loop_niters_mult_vf have already calculated the correct
+	number of vector iterations in this scenario which makes the adjustment
+	easier.  If the starting index ends up being 0 then this is all folded
+	away.  */
+      tree niters_vf
+	= gimple_convert (&e_stmts, scalar_type,
+			  LOOP_VINFO_VECTOR_NITERS_VF (loop_vinfo));
+      tree step = gimple_convert (&e_stmts, scalar_type, step_expr);
+
+      /* n_minus_1 = max(niters_vf - 1, 0) to be safe when niters_vf == 0.  */
+      tree n_minus_1 = gimple_build (&e_stmts, MINUS_EXPR, scalar_type,
+				     niters_vf, build_one_cst (scalar_type));
+
+      /* delta = (niters_vf - 1) * step.   */
+      tree delta = gimple_build (&e_stmts, MULT_EXPR, scalar_type, n_minus_1,
+				 step);
+
+      /* j_exit = init_expr + (niters_vf - 1) * step.  */
+      recomp_induc_def = vect_build_plus_adjustment (&e_stmts, scalar_type,
+						     init_expr, delta);
+    }
+
+  /* We have to dissolve the PHI back to an assignment since PHIs are always
+     at the start of the block.  This is safe due to all early exits being
+     pushed to the same block.  As such the PHI elements are all the same.  */
+  dissolve_scalar_iv_phi_nodes (loop_vinfo, PHI_RESULT (phi), induc_def,
+				recomp_induc_def, e_stmts);
+
+  /* Rewrite any usage of the latch iteration PHI if present.  */
+  dissolve_scalar_iv_phi_nodes (loop_vinfo, induc_step_def, indec_vec_def,
+				NULL_TREE, e_stmts);
+
+  slp_node->push_vec_def (induction_phi);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "transform scalar induction: created def-use cycle: %G%T",
+		     (gimple *) induction_phi, vec_def);
+  return true;
+}
+
 /* Function vectorizable_induction
 
    Check if STMT_INFO performs an induction computation that can be vectorized.
@@ -9690,53 +10113,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 				   LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo));
       peel_mul = gimple_build_vector_from_val (&init_stmts,
 					       step_vectype, peel_mul);
-
-      /* If early break then we have to create a new PHI which we can use as
-	 an offset to adjust the induction reduction in early exits.
-
-	 This is because when peeling for alignment using masking, the first
-	 few elements of the vector can be inactive.  As such if we find the
-	 entry in the first iteration we have adjust the starting point of
-	 the scalar code.
-
-	 We do this by creating a new scalar PHI that keeps track of whether
-	 we are the first iteration of the loop (with the additional masking)
-	 or whether we have taken a loop iteration already.
-
-	 The generated sequence:
-
-	 pre-header:
-	   bb1:
-	     i_1 = <number of leading inactive elements>
-
-	   header:
-	   bb2:
-	     i_2 = PHI <i_1(bb1), 0(latch)>
-	     …
-
-	   early-exit:
-	   bb3:
-	     i_3 = iv_step * i_2 + PHI<vector-iv>
-
-	 The first part of the adjustment to create i_1 and i_2 are done here
-	 and the last part creating i_3 is done in
-	 vectorizable_live_operations when the induction extraction is
-	 materialized.  */
-      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-	  && !LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
-	{
-	  auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
-	  tree ty_skip_niters = TREE_TYPE (skip_niters);
-	  tree break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
-						      vect_scalar_var,
-						      "pfa_iv_offset");
-	  gphi *nphi = create_phi_node (break_lhs_phi, bb);
-	  add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
-	  add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
-		       loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
-
-	  LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo) = PHI_RESULT (nphi);
-	}
     }
   tree step_mul = NULL_TREE;
   unsigned ivn;
@@ -10312,8 +10688,7 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 		 to the latch then we're restarting the iteration in the
 		 scalar loop.  So get the first live value.  */
 	      bool early_break_first_element_p
-		= (all_exits_as_early_p || !main_exit_edge)
-		   && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def;
+		= all_exits_as_early_p || !main_exit_edge;
 	      if (early_break_first_element_p)
 		{
 		  tmp_vec_lhs = vec_lhs0;
@@ -10322,52 +10697,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 
 	      gimple_stmt_iterator exit_gsi;
 	      tree new_tree
-		= vectorizable_live_operation_1 (loop_vinfo,
-						 e->dest, vectype,
-						 slp_node, bitsize,
-						 tmp_bitstart, tmp_vec_lhs,
-						 lhs_type, &exit_gsi);
+		  = vectorizable_live_operation_1 (loop_vinfo,
+						   e->dest, vectype,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, &exit_gsi);
 
 	      auto gsi = gsi_for_stmt (use_stmt);
-	      if (early_break_first_element_p
-		  && LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo))
-		{
-		  tree step_expr
-		    = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
-		  tree break_lhs_phi
-		    = LOOP_VINFO_MASK_NITERS_PFA_OFFSET (loop_vinfo);
-		  tree ty_skip_niters = TREE_TYPE (break_lhs_phi);
-		  gimple_seq iv_stmts = NULL;
-
-		  /* Now create the PHI for the outside loop usage to
-		     retrieve the value for the offset counter.  */
-		  tree rphi_step
-		    = gimple_convert (&iv_stmts, ty_skip_niters, step_expr);
-		  tree tmp2
-		    = gimple_build (&iv_stmts, MULT_EXPR,
-				    ty_skip_niters, rphi_step,
-				    break_lhs_phi);
-
-		  if (POINTER_TYPE_P (TREE_TYPE (new_tree)))
-		    {
-		      tmp2 = gimple_convert (&iv_stmts, sizetype, tmp2);
-		      tmp2 = gimple_build (&iv_stmts, POINTER_PLUS_EXPR,
-					   TREE_TYPE (new_tree), new_tree,
-					   tmp2);
-		    }
-		  else
-		    {
-		      tmp2 = gimple_convert (&iv_stmts, TREE_TYPE (new_tree),
-					     tmp2);
-		      tmp2 = gimple_build (&iv_stmts, PLUS_EXPR,
-					   TREE_TYPE (new_tree), new_tree,
-					   tmp2);
-		    }
-
-		  new_tree = tmp2;
-		  gsi_insert_seq_before (&exit_gsi, iv_stmts, GSI_SAME_STMT);
-		}
-
 	      tree lhs_phi = gimple_phi_result (use_stmt);
 	      remove_phi_node (&gsi, false);
 	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9698709f5671971c35a50a16a258874beb44514a..e050f34d2578ed9168ff30ec02ca746c132a08fe 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5620,6 +5620,44 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size,
 	      }
 	  }
 
+      /* Find and create slp instances for inductions that have been forced
+	 live due to early break.  */
+      edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
+      for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo))
+	  {
+	    gphi *phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
+	    tree def = gimple_phi_arg_def_from_edge (phi, latch_e);
+
+	    slp_tree node = vect_create_new_slp_node (vNULL);
+	    SLP_TREE_VECTYPE (node) = NULL_TREE;
+	    SLP_TREE_LANES (node) = 1;
+	    SLP_TREE_DEF_TYPE (node) = vect_internal_def;
+	    SLP_TREE_VEC_DEFS (node).safe_push (def);
+	    SLP_TREE_REPRESENTATIVE (node) = stmt_info;
+
+	    /* Create a new SLP instance.  */
+	    slp_instance new_instance = XNEW (class _slp_instance);
+	    SLP_INSTANCE_TREE (new_instance) = node;
+	    SLP_INSTANCE_LOADS (new_instance) = vNULL;
+	    SLP_INSTANCE_ROOT_STMTS (new_instance) = vNULL;
+	    SLP_INSTANCE_REMAIN_DEFS (new_instance) = vNULL;
+	    SLP_INSTANCE_KIND (new_instance) = slp_inst_kind_scalar_iv;
+	    new_instance->reduc_phis = NULL;
+	    new_instance->cost_vec = vNULL;
+	    new_instance->subgraph_entries = vNULL;
+
+	    vinfo->slp_instances.safe_push (new_instance);
+
+	    if (dump_enabled_p ())
+	      {
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Final scalar def SLP tree for instance %p:\n",
+				 (void *) new_instance);
+		vect_print_slp_graph (MSG_NOTE, vect_location,
+				      SLP_INSTANCE_TREE (new_instance));
+	      }
+	  }
+
       /* Find SLP sequences starting from gconds.  */
       for (auto cond : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
 	{
@@ -5664,48 +5702,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size,
 					     "SLP build failed.\n");
 	    }
 	}
-
-	/* Find and create slp instances for inductions that have been forced
-	   live due to early break.  */
-	edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
-	for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo))
-	  {
-	    vec<stmt_vec_info> stmts;
-	    vec<stmt_vec_info> roots = vNULL;
-	    vec<tree> remain = vNULL;
-	    gphi *phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
-	    tree def = gimple_phi_arg_def_from_edge (phi, latch_e);
-	    stmt_vec_info lc_info = loop_vinfo->lookup_def (def);
-	    if (lc_info)
-	      {
-		stmts.create (1);
-		stmts.quick_push (vect_stmt_to_vectorize (lc_info));
-		if (! vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
-					       stmts, roots, remain,
-					       max_tree_size, &limit,
-					       bst_map, force_single_lane))
-		  return opt_result::failure_at (vect_location,
-						 "SLP build failed.\n");
-	      }
-	    /* When the latch def is from a different cycle this can only
-	       be a induction.  Build a simple instance for this.
-	       ???  We should be able to start discovery from the PHI
-	       for all inductions, but then there will be stray
-	       non-SLP stmts we choke on as needing non-SLP handling.  */
-	    auto_vec<stmt_vec_info, 1> tem;
-	    tem.quick_push (stmt_info);
-	    if (!bst_map->get (tem))
-	      {
-		stmts.create (1);
-		stmts.quick_push (stmt_info);
-		if (! vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
-					       stmts, roots, remain,
-					       max_tree_size, &limit,
-					       bst_map, force_single_lane))
-		  return opt_result::failure_at (vect_location,
-						 "SLP build failed.\n");
-	      }
-	  }
     }
 
   hash_set<slp_tree> visited_patterns;
@@ -8542,6 +8538,7 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node,
      insertion place.  */
   if (res
       && !seen_non_constant_child
+      && SLP_INSTANCE_KIND (node_instance) != slp_inst_kind_scalar_iv
       && SLP_TREE_SCALAR_STMTS (node).is_empty ())
     {
       if (dump_enabled_p ())
@@ -8986,8 +8983,10 @@ vect_slp_analyze_operations (vec_info *vinfo)
 	  stmt_vec_info stmt_info;
 	  if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
 	    stmt_info = SLP_INSTANCE_ROOT_STMTS (instance)[0];
-	  else
+	  else if (!SLP_TREE_SCALAR_STMTS (node).is_empty ())
 	    stmt_info = SLP_TREE_SCALAR_STMTS (node)[0];
+	  else
+	    stmt_info = SLP_TREE_REPRESENTATIVE (node);
 	  if (is_a <loop_vec_info> (vinfo))
 	    {
 	      if (dump_enabled_p ())
@@ -11617,7 +11616,8 @@ vect_schedule_slp_node (vec_info *vinfo,
 
   stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node);
 
-  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ());
+  gcc_assert (SLP_TREE_VEC_DEFS (node).is_empty ()
+	      || SLP_INSTANCE_KIND (instance) == slp_inst_kind_scalar_iv);
   if (SLP_TREE_VECTYPE (node))
     SLP_TREE_VEC_DEFS (node).create (vect_get_num_copies (vinfo, node));
 
@@ -11636,7 +11636,8 @@ vect_schedule_slp_node (vec_info *vinfo,
   else if (!SLP_TREE_PERMUTE_P (node)
 	   && (SLP_TREE_TYPE (node) == cycle_phi_info_type
 	       || SLP_TREE_TYPE (node) == induc_vec_info_type
-	       || SLP_TREE_TYPE (node) == phi_info_type))
+	       || SLP_TREE_TYPE (node) == phi_info_type
+	       || SLP_TREE_TYPE (node) == scalar_iv_info_type))
     {
       /* For PHI node vectorization we do not use the insertion iterator.  */
       si = gsi_none ();
@@ -12024,7 +12025,8 @@ vect_schedule_scc (vec_info *vinfo, slp_tree node, slp_instance instance,
   maxdfs++;
 
   /* Leaf.  */
-  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def
+      || SLP_INSTANCE_KIND (instance) == slp_inst_kind_scalar_iv)
     {
       info->on_stack = false;
       vect_schedule_slp_node (vinfo, node, instance);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 83acbb3ff67ccdd4a39606850a23f483d6a4b1fb..5bddd7f37da4ea1048998b2c82ed464aa10a6730 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -435,7 +435,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 			 "vec_stmt_relevant_p: PHI forced live for "
 			 "early break.\n");
       LOOP_VINFO_EARLY_BREAKS_LIVE_IVS (loop_vinfo).safe_push (stmt_info);
-      *live_p = true;
+      return true;
     }
 
   if (*live_p && *relevant == vect_unused_in_scope
@@ -12750,17 +12750,12 @@ can_vectorize_live_stmts (vec_info *vinfo,
 			  bool vec_stmt_p,
 			  stmt_vector_for_cost *cost_vec)
 {
-  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   stmt_vec_info slp_stmt_info;
   unsigned int i;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
     {
       if (slp_stmt_info
-	  && (STMT_VINFO_LIVE_P (slp_stmt_info)
-	      || (loop_vinfo
-		  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
-		  && STMT_VINFO_DEF_TYPE (slp_stmt_info)
-		  == vect_induction_def))
+	  && STMT_VINFO_LIVE_P (slp_stmt_info)
 	  && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
 					   slp_node_instance, i,
 					   vec_stmt_p, cost_vec))
@@ -12796,6 +12791,21 @@ vect_analyze_stmt (vec_info *vinfo,
 				     stmt_info->stmt);
     }
 
+  /* Check if it's a scalar IV that we can codegen it.  Scalar IVs aren't forced
+     live as we don't want the vectorizer to analyze it as we don't set e.g.
+     vectype and we don't want it to be used in determining VF.  */
+  if (SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_scalar_iv
+      && is_a <loop_vec_info> (vinfo))
+    {
+      if (!vectorizable_scalar_induction (as_a <loop_vec_info> (vinfo),
+					  stmt_info, node, cost_vec))
+	return opt_result::failure_at (stmt_info->stmt,
+				       "not vectorized:"
+				       " scalar IV not supported: %G",
+				       stmt_info->stmt);
+      return opt_result::success ();
+    }
+
   /* Skip stmts that do not need to be vectorized.  */
   if (!STMT_VINFO_RELEVANT_P (stmt_info)
       && !STMT_VINFO_LIVE_P (stmt_info))
@@ -12852,6 +12862,7 @@ vect_analyze_stmt (vec_info *vinfo,
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (SLP_TREE_VECTYPE (node)
 		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
+		  || SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_scalar_iv
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
     }
 
@@ -13031,6 +13042,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case scalar_iv_info_type:
+      done = vectorizable_scalar_induction (as_a <loop_vec_info> (vinfo),
+					    stmt_info, slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     case permute_info_type:
       done = vectorizable_slp_permutation (vinfo, gsi, slp_node, NULL);
       gcc_assert (done);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 905a29142d3eb8077ab9fb29b3cceb04834848fe..d0aabebccdf0e308999d39378ebd0c1b503b50f7 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -243,6 +243,7 @@ enum stmt_vec_info_type {
   phi_info_type,
   recurr_info_type,
   loop_exit_ctrl_vec_info_type,
+  scalar_iv_info_type,
   permute_info_type
 };
 
@@ -392,7 +393,8 @@ enum slp_instance_kind {
     slp_inst_kind_reduc_chain,
     slp_inst_kind_bb_reduc,
     slp_inst_kind_ctor,
-    slp_inst_kind_gcond
+    slp_inst_kind_gcond,
+    slp_inst_kind_scalar_iv
 };
 
 /* SLP instance is a sequence of stmts in a loop that can be packed into
@@ -1236,6 +1238,10 @@ public:
      happen.  */
   auto_vec<gimple*> early_break_vuses;
 
+  /* The number of scalar iterations performed as vector in the case the loop
+     exits from the main exit block.  This can be an SSA name or a constant.  */
+  tree niters_vector_mult_vf;
+
   /* Record statements that are needed to be live for early break vectorization
      but may not have an LC PHI node materialized yet in the exits.  */
   auto_vec<stmt_vec_info> early_break_live_ivs;
@@ -1306,6 +1312,7 @@ public:
   (L)->early_break_live_ivs
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
+#define LOOP_VINFO_VECTOR_NITERS_VF(L)     (L)->niters_vector_mult_vf
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -2705,6 +2712,8 @@ extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
 extern bool vectorizable_early_exit (loop_vec_info, stmt_vec_info,
 				     gimple_stmt_iterator *,
 				     slp_tree, stmt_vector_for_cost *);
+extern bool vectorizable_scalar_induction (loop_vec_info, stmt_vec_info,
+					   slp_tree, stmt_vector_for_cost *);
 extern bool vect_emulated_vector_p (tree);
 extern bool vect_can_vectorize_without_simd_p (tree_code);
 extern bool vect_can_vectorize_without_simd_p (code_helper);

Reply via email to