On Fri, 23 Aug 2019, Uros Bizjak wrote:

> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguent...@suse.de> wrote:
> >
> >
> > This fixes quadraticness in STV and makes
> >
> >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  89.10 (
> > 95%)      54 kB (  0%)
> >
> > drop to zero.  Anybody remembers why it is the way it is now?
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> and will be updated?
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3

Yes.  I'm still learning how STV operates (and learing DF and RTL...).
The following is a rewrite of the non-TImode chain conversion
according to I think how it should operate als allowing the hunk
that fixes the compile-time and fixing PR91527 on the way
(which I ran into during more extensive testing of the patch myself).

So compared to the state before which I still do not 100% understand
we now do the following.  Chain detection works as before including
recording of all defs (both defined by the insns in the chain and
insns outside) that need copy-in or copy-out operations.

But then the patch changes things as to guarantee that
after the conversion all uses/defs of a pseudo are
of the (subreg:Vmode ..) form or of the original scalar form.
In particular it avoids the need to change any insns that
are not part of the chain (besides emitting the extra copy
instructions).  For this all defs that were marked as needing
copies (thus they have uses/defs both in the chain and outside)
the chain will use a new pseudo that we copy to from scalar sources
and that we copy from for scalar uses.  There's the new defs_map
which records the mapping of old to new reg.  pseudos that are
only used in the chain already are not remapped.

The conversion itself then happens in two stages, first,
in make_vector_copies, we emit the copy-in insns and
allocate all pseudos we need.  Then the rest of the
conversion happens fully inside of convert_insn where
we generate the copy-outs of the insns def, replace
defs and uses according to the mapping and replace uses
and defs with the (subreg:Vmode ..) style.

For PR91527 IRA doesn't like the REG_EQUIV note in

(insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
        (subreg:V4SI (reg:SI 100) 0)) 
"/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 
1248 {movv4si_internal}
     (expr_list:REG_DEAD (reg:SI 100)
        (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
                    (const_int 16 [0x10])) [1 c+0 S4 A64])
            (nil))))

because the SET_DEST is not a REG_P.  I'm not sure if this
is invalid RTL, docs say SET_DEST might be a strict_low_part
or a zero_extract but doesn't mention a subreg.  So I opted
to simply remove equal/equiv notes on insns we convert
and since the above has a REG_DEAD note I took the liberty
to update that according to the mapping (so that would have
been not needed before this patch) rather than dropping it.

Bootstrapped with and without --with-march=westmere (to get
some STV coverage, this included all languages) on 
x88_64-unknown-linux-gnu, testing in progress.

OK if testing succeeds?

It still solves the compile-time issue (which is a latent issue,
btw, and with a carefully crafted testcase can be triggered
since STV exists for DImode chains with !TARGET_64BIT).

Thanks,
Richard.

2019-08-22  Richard Biener  <rguent...@suse.de>

        PR target/91522
        PR target/91527
        * config/i386/i386-features.h (general_scalar_chain::defs_map):
        New member.
        (general_scalar_chain::replace_with_subreg): Remove.
        (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
        (general_scalar_chain::convert_reg): Adjust signature.
        * config/i386/i386-features.c (scalar_chain::add_insn): Do not
        iterate over all defs of a reg.
        (general_scalar_chain::replace_with_subreg): Remove.
        (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
        (general_scalar_chain::make_vector_copies): Populate defs_map,
        place copy only after defs that are used as vectors in the chain.
        (general_scalar_chain::convert_reg): Emit a copy for a specific
        def in a specific instruction.
        (general_scalar_chain::convert_op): All reg uses are converted here.
        (general_scalar_chain::convert_insn): Emit copies for scalar
        uses of defs here.  Replace uses with the copies we created.
        Replace and convert the def.  Adjust REG_DEAD notes, remove
        REG_EQUIV/EQUAL notes.
        (general_scalar_chain::convert_registers): Only handle copies
        into the chain here.

Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c     (revision 274843)
+++ gcc/config/i386/i386-features.c     (working copy)
@@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
      iterates over all refs to look for dual-mode regs.  Instead this
      should be done separately for all regs mentioned in the chain once.  */
   df_ref ref;
-  df_ref def;
   for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!HARD_REGISTER_P (DF_REF_REG (ref)))
-      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
-          def;
-          def = DF_REF_NEXT_REG (def))
-       analyze_register_chain (candidates, def);
+      analyze_register_chain (candidates, ref);
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
     if (!DF_REF_REG_MEM_P (ref))
       analyze_register_chain (candidates, ref);
@@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
   return gain;
 }
 
-/* Replace REG in X with a V2DI subreg of NEW_REG.  */
-
-rtx
-general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
-{
-  if (x == reg)
-    return gen_rtx_SUBREG (vmode, new_reg, 0);
-
-  /* But not in memory addresses.  */
-  if (MEM_P (x))
-    return x;
-
-  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
-  int i, j;
-  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
-    {
-      if (fmt[i] == 'e')
-       XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
-      else if (fmt[i] == 'E')
-       for (j = XVECLEN (x, i) - 1; j >= 0; j--)
-         XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
-                                                  reg, new_reg);
-    }
-
-  return x;
-}
-
-/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
-
-void
-general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
-                                                 rtx reg, rtx new_reg)
-{
-  replace_with_subreg (single_set (insn), reg, new_reg);
-}
-
 /* Insert generated conversion instruction sequence INSNS
    after instruction AFTER.  New BB may be required in case
    instruction has EH region attached.  */
@@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
   rtx vreg = gen_reg_rtx (smode);
   df_ref ref;
 
-  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-       start_sequence ();
-       if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
-         {
-           rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-           if (smode == DImode && !TARGET_64BIT)
-             {
-               emit_move_insn (adjust_address (tmp, SImode, 0),
-                               gen_rtx_SUBREG (SImode, reg, 0));
-               emit_move_insn (adjust_address (tmp, SImode, 4),
-                               gen_rtx_SUBREG (SImode, reg, 4));
-             }
-           else
-             emit_move_insn (copy_rtx (tmp), reg);
-           emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-                                   gen_gpr_to_xmm_move_src (vmode, tmp)));
-         }
-       else if (!TARGET_64BIT && smode == DImode)
-         {
-           if (TARGET_SSE4_1)
-             {
-               emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-                                           CONST0_RTX (V4SImode),
-                                           gen_rtx_SUBREG (SImode, reg, 0)));
-               emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 
0),
-                                             gen_rtx_SUBREG (V4SImode, vreg, 
0),
-                                             gen_rtx_SUBREG (SImode, reg, 4),
-                                             GEN_INT (2)));
-             }
-           else
-             {
-               rtx tmp = gen_reg_rtx (DImode);
-               emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
-                                           CONST0_RTX (V4SImode),
-                                           gen_rtx_SUBREG (SImode, reg, 0)));
-               emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
-                                           CONST0_RTX (V4SImode),
-                                           gen_rtx_SUBREG (SImode, reg, 4)));
-               emit_insn (gen_vec_interleave_lowv4si
-                          (gen_rtx_SUBREG (V4SImode, vreg, 0),
-                           gen_rtx_SUBREG (V4SImode, vreg, 0),
-                           gen_rtx_SUBREG (V4SImode, tmp, 0)));
-             }
-         }
-       else
-         emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
-                                 gen_gpr_to_xmm_move_src (vmode, reg)));
-       rtx_insn *seq = get_insns ();
-       end_sequence ();
-       rtx_insn *insn = DF_REF_INSN (ref);
-       emit_conversion_insns (seq, insn);
-
-       if (dump_file)
-         fprintf (dump_file,
-                  "  Copied r%d to a vector register r%d for insn %d\n",
-                  regno, REGNO (vreg), INSN_UID (insn));
-      }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-       rtx_insn *insn = DF_REF_INSN (ref);
-       replace_with_subreg_in_insn (insn, reg, vreg);
-
-       if (dump_file)
-         fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
-                  regno, REGNO (vreg), INSN_UID (insn));
-      }
-}
-
-/* Convert all definitions of register REGNO
-   and fix its uses.  Scalar copies may be created
-   in case register is used in not convertible insn.  */
-
-void
-general_scalar_chain::convert_reg (unsigned regno)
-{
-  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
-  rtx reg = regno_reg_rtx[regno];
-  rtx scopy = NULL_RTX;
-  df_ref ref;
-  bitmap conv;
-
-  conv = BITMAP_ALLOC (NULL);
-  bitmap_copy (conv, insns);
-
-  if (scalar_copy)
-    scopy = gen_reg_rtx (smode);
+  defs_map.put (reg, vreg);
 
+  /* For each insn defining REGNO, see if it is defined by an insn
+     not part of the chain but with uses in insns part of the chain
+     and insert a copy in that case.  */
   for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
     {
-      rtx_insn *insn = DF_REF_INSN (ref);
-      rtx def_set = single_set (insn);
-      rtx src = SET_SRC (def_set);
-      rtx reg = DF_REF_REG (ref);
+      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
+       continue;
+      df_link *use;
+      for (use = DF_REF_CHAIN (ref); use; use = use->next)
+       if (!DF_REF_REG_MEM_P (use->ref)
+           && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+         break;
+      if (!use)
+       continue;
 
-      if (!MEM_P (src))
+      start_sequence ();
+      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
        {
-         replace_with_subreg_in_insn (insn, reg, reg);
-         bitmap_clear_bit (conv, INSN_UID (insn));
+         rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+         if (smode == DImode && !TARGET_64BIT)
+           {
+             emit_move_insn (adjust_address (tmp, SImode, 0),
+                             gen_rtx_SUBREG (SImode, reg, 0));
+             emit_move_insn (adjust_address (tmp, SImode, 4),
+                             gen_rtx_SUBREG (SImode, reg, 4));
+           }
+         else
+           emit_move_insn (copy_rtx (tmp), reg);
+         emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+                                 gen_gpr_to_xmm_move_src (vmode, tmp)));
        }
-
-      if (scalar_copy)
+      else if (!TARGET_64BIT && smode == DImode)
        {
-         start_sequence ();
-         if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+         if (TARGET_SSE4_1)
            {
-             rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
-             emit_move_insn (tmp, reg);
-             if (!TARGET_64BIT && smode == DImode)
-               {
-                 emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-                                 adjust_address (tmp, SImode, 0));
-                 emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-                                 adjust_address (tmp, SImode, 4));
-               }
-             else
-               emit_move_insn (scopy, copy_rtx (tmp));
+             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+                                         CONST0_RTX (V4SImode),
+                                         gen_rtx_SUBREG (SImode, reg, 0)));
+             emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0),
+                                           gen_rtx_SUBREG (V4SImode, vreg, 0),
+                                           gen_rtx_SUBREG (SImode, reg, 4),
+                                           GEN_INT (2)));
            }
-         else if (!TARGET_64BIT && smode == DImode)
+         else
            {
-             if (TARGET_SSE4_1)
-               {
-                 rtx tmp = gen_rtx_PARALLEL (VOIDmode,
-                                             gen_rtvec (1, const0_rtx));
-                 emit_insn
-                   (gen_rtx_SET
-                      (gen_rtx_SUBREG (SImode, scopy, 0),
-                       gen_rtx_VEC_SELECT (SImode,
-                                           gen_rtx_SUBREG (V4SImode, reg, 0),
-                                           tmp)));
-
-                 tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
-                 emit_insn
-                   (gen_rtx_SET
-                      (gen_rtx_SUBREG (SImode, scopy, 4),
-                       gen_rtx_VEC_SELECT (SImode,
-                                           gen_rtx_SUBREG (V4SImode, reg, 0),
-                                           tmp)));
-               }
-             else
-               {
-                 rtx vcopy = gen_reg_rtx (V2DImode);
-                 emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
-                 emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
-                                 gen_rtx_SUBREG (SImode, vcopy, 0));
-                 emit_move_insn (vcopy,
-                                 gen_rtx_LSHIFTRT (V2DImode,
-                                                   vcopy, GEN_INT (32)));
-                 emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
-                                 gen_rtx_SUBREG (SImode, vcopy, 0));
-               }
+             rtx tmp = gen_reg_rtx (DImode);
+             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
+                                         CONST0_RTX (V4SImode),
+                                         gen_rtx_SUBREG (SImode, reg, 0)));
+             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
+                                         CONST0_RTX (V4SImode),
+                                         gen_rtx_SUBREG (SImode, reg, 4)));
+             emit_insn (gen_vec_interleave_lowv4si
+                        (gen_rtx_SUBREG (V4SImode, vreg, 0),
+                         gen_rtx_SUBREG (V4SImode, vreg, 0),
+                         gen_rtx_SUBREG (V4SImode, tmp, 0)));
            }
-         else
-           emit_move_insn (scopy, reg);
-
-         rtx_insn *seq = get_insns ();
-         end_sequence ();
-         emit_conversion_insns (seq, insn);
-
-         if (dump_file)
-           fprintf (dump_file,
-                    "  Copied r%d to a scalar register r%d for insn %d\n",
-                    regno, REGNO (scopy), INSN_UID (insn));
        }
-    }
-
-  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
-    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
-      {
-       if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
-         {
-           rtx_insn *insn = DF_REF_INSN (ref);
+      else
+       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
+                               gen_gpr_to_xmm_move_src (vmode, reg)));
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      rtx_insn *insn = DF_REF_INSN (ref);
+      emit_conversion_insns (seq, insn);
 
-           rtx def_set = single_set (insn);
-           gcc_assert (def_set);
+      if (dump_file)
+       fprintf (dump_file,
+                "  Copied r%d to a vector register r%d for insn %d\n",
+                regno, REGNO (vreg), INSN_UID (insn));
+    }
+}
 
-           rtx src = SET_SRC (def_set);
-           rtx dst = SET_DEST (def_set);
+/* Copy the definition SRC of INSN inside the chain to DST for
+   scalar uses outside of the chain.  */
 
-           if (!MEM_P (dst) || !REG_P (src))
-             replace_with_subreg_in_insn (insn, reg, reg);
+void
+general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
+{
+  start_sequence ();
+  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
+    {
+      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
+      emit_move_insn (tmp, src);
+      if (!TARGET_64BIT && smode == DImode)
+       {
+         emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+                         adjust_address (tmp, SImode, 0));
+         emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+                         adjust_address (tmp, SImode, 4));
+       }
+      else
+       emit_move_insn (dst, copy_rtx (tmp));
+    }
+  else if (!TARGET_64BIT && smode == DImode)
+    {
+      if (TARGET_SSE4_1)
+       {
+         rtx tmp = gen_rtx_PARALLEL (VOIDmode,
+                                     gen_rtvec (1, const0_rtx));
+         emit_insn
+             (gen_rtx_SET
+              (gen_rtx_SUBREG (SImode, dst, 0),
+               gen_rtx_VEC_SELECT (SImode,
+                                   gen_rtx_SUBREG (V4SImode, src, 0),
+                                   tmp)));
+
+         tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
+         emit_insn
+             (gen_rtx_SET
+              (gen_rtx_SUBREG (SImode, dst, 4),
+               gen_rtx_VEC_SELECT (SImode,
+                                   gen_rtx_SUBREG (V4SImode, src, 0),
+                                   tmp)));
+       }
+      else
+       {
+         rtx vcopy = gen_reg_rtx (V2DImode);
+         emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
+         emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
+                         gen_rtx_SUBREG (SImode, vcopy, 0));
+         emit_move_insn (vcopy,
+                         gen_rtx_LSHIFTRT (V2DImode,
+                                           vcopy, GEN_INT (32)));
+         emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
+                         gen_rtx_SUBREG (SImode, vcopy, 0));
+       }
+    }
+  else
+    emit_move_insn (dst, src);
 
-           bitmap_clear_bit (conv, INSN_UID (insn));
-         }
-      }
-    /* Skip debug insns and uninitialized uses.  */
-    else if (DF_REF_CHAIN (ref)
-            && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
-      {
-       gcc_assert (scopy);
-       replace_rtx (DF_REF_INSN (ref), reg, scopy);
-       df_insn_rescan (DF_REF_INSN (ref));
-      }
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  emit_conversion_insns (seq, insn);
 
-  BITMAP_FREE (conv);
+  if (dump_file)
+    fprintf (dump_file,
+            "  Copied r%d to a scalar register r%d for insn %d\n",
+            REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
 /* Convert operand OP in INSN.  We should handle
@@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
     }
   else if (REG_P (*op))
     {
-      /* We may have not converted register usage in case
-        this register has no definition.  Otherwise it
-        should be converted in convert_reg.  */
-      df_ref ref;
-      FOR_EACH_INSN_USE (ref, insn)
-       if (DF_REF_REGNO (ref) == REGNO (*op))
-         {
-           gcc_assert (!DF_REF_CHAIN (ref));
-           break;
-         }
       *op = gen_rtx_SUBREG (vmode, *op, 0);
     }
   else if (CONST_INT_P (*op))
@@ -980,6 +873,32 @@ general_scalar_chain::convert_insn (rtx_
   rtx dst = SET_DEST (def_set);
   rtx subreg;
 
+  /* Generate copies for out-of-chain uses of defs.  */
+  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
+      {
+       df_link *use;
+       for (use = DF_REF_CHAIN (ref); use; use = use->next)
+         if (DF_REF_REG_MEM_P (use->ref)
+             || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+           break;
+       if (use)
+         convert_reg (insn, DF_REF_REG (ref),
+                      *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
+      }
+
+  /* Replace uses in this insn with the defs we use in the chain.  */
+  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
+    if (!DF_REF_REG_MEM_P (ref))
+      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
+       {
+         /* Also update a corresponding REG_DEAD note.  */
+         rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
+         if (note)
+           XEXP (note, 0) = *vreg;
+         *DF_REF_REAL_LOC (ref) = *vreg;
+       }
+
   if (MEM_P (dst) && !REG_P (src))
     {
       /* There are no scalar integer instructions and therefore
@@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
       emit_conversion_insns (gen_move_insn (dst, tmp), insn);
       dst = gen_rtx_SUBREG (vmode, tmp, 0);
     }
+  else if (REG_P (dst))
+    {
+      /* Replace the definition with a SUBREG to the definition we
+         use inside the chain.  */
+      rtx *vdef = defs_map.get (dst);
+      if (vdef)
+       dst = *vdef;
+      dst = gen_rtx_SUBREG (vmode, dst, 0);
+      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
+         is a non-REG_P.  So kill those off.  */
+      rtx note = find_reg_equal_equiv_note (insn);
+      if (note)
+       remove_note (insn, note);
+    }
 
   switch (GET_CODE (src))
     {
@@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
     case COMPARE:
       src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
 
-      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
-                 || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
-
-      if (REG_P (src))
-       subreg = gen_rtx_SUBREG (V2DImode, src, 0);
-      else
-       subreg = copy_rtx_if_shared (src);
+      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
+      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
       emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared 
(subreg),
                                                    copy_rtx_if_shared (subreg),
                                                    copy_rtx_if_shared 
(subreg)),
                        insn);
       dst = gen_rtx_REG (CCmode, FLAGS_REG);
-      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
-                                              copy_rtx_if_shared (src)),
+      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg),
+                                              copy_rtx_if_shared (subreg)),
                            UNSPEC_PTEST);
       break;
 
@@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
   df_insn_rescan (insn);
 }
 
+/* Generate copies from defs used by the chain but not defined therein.
+   Also populates defs_map which is used later by convert_insn.  */
+
 void
 general_scalar_chain::convert_registers ()
 {
   bitmap_iterator bi;
   unsigned id;
-
-  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
-    convert_reg (id);
-
-  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
+  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
     make_vector_copies (id);
 }
 
Index: gcc/config/i386/i386-features.h
===================================================================
--- gcc/config/i386/i386-features.h     (revision 274843)
+++ gcc/config/i386/i386-features.h     (working copy)
@@ -171,12 +171,11 @@ class general_scalar_chain : public scal
     : scalar_chain (smode_, vmode_) {}
   int compute_convert_gain ();
  private:
+  hash_map<rtx, rtx> defs_map;
   void mark_dual_mode_def (df_ref def);
-  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
-  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
   void convert_insn (rtx_insn *insn);
   void convert_op (rtx *op, rtx_insn *insn);
-  void convert_reg (unsigned regno);
+  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
   void make_vector_copies (unsigned regno);
   void convert_registers ();
   int vector_const_cost (rtx exp);

Reply via email to