On Mon, Aug 26, 2019 at 10:40 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 23 Aug 2019, Richard Biener wrote:
>
> > On Fri, 23 Aug 2019, Richard Biener wrote:
> >
> > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > >
> > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguent...@suse.de> 
> > > > wrote:
> > > > >
> > > > >
> > > > > This fixes quadraticness in STV and makes
> > > > >
> > > > >  machine dep reorg                  :  89.07 ( 95%)   0.02 ( 18%)  
> > > > > 89.10 (
> > > > > 95%)      54 kB (  0%)
> > > > >
> > > > > drop to zero.  Anybody remembers why it is the way it is now?
> > > > >
> > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > > > >
> > > > > OK?
> > > >
> > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete
> > > > and will be updated?
> > > >
> > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
> > >
> > > Yes.  I'm still learning how STV operates (and learing DF and RTL...).
> > > The following is a rewrite of the non-TImode chain conversion
> > > according to I think how it should operate als allowing the hunk
> > > that fixes the compile-time and fixing PR91527 on the way
> > > (which I ran into during more extensive testing of the patch myself).
> > >
> > > So compared to the state before which I still do not 100% understand
> > > we now do the following.  Chain detection works as before including
> > > recording of all defs (both defined by the insns in the chain and
> > > insns outside) that need copy-in or copy-out operations.
> > >
> > > But then the patch changes things as to guarantee that
> > > after the conversion all uses/defs of a pseudo are
> > > of the (subreg:Vmode ..) form or of the original scalar form.
> > > In particular it avoids the need to change any insns that
> > > are not part of the chain (besides emitting the extra copy
> > > instructions).  For this all defs that were marked as needing
> > > copies (thus they have uses/defs both in the chain and outside)
> > > the chain will use a new pseudo that we copy to from scalar sources
> > > and that we copy from for scalar uses.  There's the new defs_map
> > > which records the mapping of old to new reg.  pseudos that are
> > > only used in the chain already are not remapped.
> > >
> > > The conversion itself then happens in two stages, first,
> > > in make_vector_copies, we emit the copy-in insns and
> > > allocate all pseudos we need.  Then the rest of the
> > > conversion happens fully inside of convert_insn where
> > > we generate the copy-outs of the insns def, replace
> > > defs and uses according to the mapping and replace uses
> > > and defs with the (subreg:Vmode ..) style.
> > >
> > > For PR91527 IRA doesn't like the REG_EQUIV note in
> > >
> > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
> > >         (subreg:V4SI (reg:SI 100) 0))
> > > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
> > > 1248 {movv4si_internal}
> > >      (expr_list:REG_DEAD (reg:SI 100)
> > >         (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
> > >                     (const_int 16 [0x10])) [1 c+0 S4 A64])
> > >             (nil))))
> > >
> > > because the SET_DEST is not a REG_P.  I'm not sure if this
> > > is invalid RTL, docs say SET_DEST might be a strict_low_part
> > > or a zero_extract but doesn't mention a subreg.  So I opted
> > > to simply remove equal/equiv notes on insns we convert
> > > and since the above has a REG_DEAD note I took the liberty
> > > to update that according to the mapping (so that would have
> > > been not needed before this patch) rather than dropping it.
> > >
> > > Bootstrapped with and without --with-march=westmere (to get
> > > some STV coverage, this included all languages) on
> > > x88_64-unknown-linux-gnu, testing in progress.
> > >
> > > OK if testing succeeds?
> >
> > Testing revealed I made an error in general_scalar_chain::convert_insn
> > failing to move down SET_SRC extraction below replacing with
> > the defs map.  This showed in 4 execute FAILs in 32bit fortran
> > testing (only).  Fixed by moving down the whole def_set/src/dst
> > extraction.
> >
> > Re-testing on x86_64-unknown-linux-gnu.
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu.  The "no-costmodel"
> run runs into the latent PR91528 building 32bit libada in stage3
> for a few sources, I've manually built those with -mno-stv and
> bootstrap finishes successfully.  I hope HJ can help with this
> dynamic stack-alignment issue.
>
> So - OK for trunk?
>
> As followup we can now remove general_remove_non_convertible_regs
> because we can handle defs that cannot be converted just fine
> with the patch since we're splitting live-ranges of all defs at
> the chain boundary.
>
> Thanks,
> Richard.
>
> > Updated patch below.  I'm feeling adventurous and will run
> > the "westmere" bootstrap with costing disabled (aka always
> > convert detected chains...).
> >
> > Richard.
> >
> > 2019-08-23  Richard Biener  <rguent...@suse.de>
> >
> >       PR target/91522
> >       PR target/91527
> >       * config/i386/i386-features.h (general_scalar_chain::defs_map):
> >       New member.
> >       (general_scalar_chain::replace_with_subreg): Remove.
> >       (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >       (general_scalar_chain::convert_reg): Adjust signature.
> >       * config/i386/i386-features.c (scalar_chain::add_insn): Do not
> >       iterate over all defs of a reg.
> >       (general_scalar_chain::replace_with_subreg): Remove.
> >       (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
> >       (general_scalar_chain::make_vector_copies): Populate defs_map,
> >       place copy only after defs that are used as vectors in the chain.
> >       (general_scalar_chain::convert_reg): Emit a copy for a specific
> >       def in a specific instruction.
> >       (general_scalar_chain::convert_op): All reg uses are converted here.
> >       (general_scalar_chain::convert_insn): Emit copies for scalar
> >       uses of defs here.  Replace uses with the copies we created.
> >       Replace and convert the def.  Adjust REG_DEAD notes, remove
> >       REG_EQUIV/EQUAL notes.
> >       (general_scalar_chain::convert_registers): Only handle copies
> >       into the chain here.

Rubberstamped with LGTM. It looks you are the master of this domain now ;)

Thanks,
Uros.

> >
> >
> > Index: gcc/config/i386/i386-features.c
> > ===================================================================
> > --- gcc/config/i386/i386-features.c   (revision 274843)
> > +++ gcc/config/i386/i386-features.c   (working copy)
> > @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
> >       iterates over all refs to look for dual-mode regs.  Instead this
> >       should be done separately for all regs mentioned in the chain once.  
> > */
> >    df_ref ref;
> > -  df_ref def;
> >    for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
> >      if (!HARD_REGISTER_P (DF_REF_REG (ref)))
> > -      for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
> > -        def;
> > -        def = DF_REF_NEXT_REG (def))
> > -     analyze_register_chain (candidates, def);
> > +      analyze_register_chain (candidates, ref);
> >    for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
> >      if (!DF_REF_REG_MEM_P (ref))
> >        analyze_register_chain (candidates, ref);
> > @@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
> >    return gain;
> >  }
> >
> > -/* Replace REG in X with a V2DI subreg of NEW_REG.  */
> > -
> > -rtx
> > -general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
> > -{
> > -  if (x == reg)
> > -    return gen_rtx_SUBREG (vmode, new_reg, 0);
> > -
> > -  /* But not in memory addresses.  */
> > -  if (MEM_P (x))
> > -    return x;
> > -
> > -  const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
> > -  int i, j;
> > -  for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
> > -    {
> > -      if (fmt[i] == 'e')
> > -     XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
> > -      else if (fmt[i] == 'E')
> > -     for (j = XVECLEN (x, i) - 1; j >= 0; j--)
> > -       XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
> > -                                                reg, new_reg);
> > -    }
> > -
> > -  return x;
> > -}
> > -
> > -/* Replace REG in INSN with a V2DI subreg of NEW_REG.  */
> > -
> > -void
> > -general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
> > -                                               rtx reg, rtx new_reg)
> > -{
> > -  replace_with_subreg (single_set (insn), reg, new_reg);
> > -}
> > -
> >  /* Insert generated conversion instruction sequence INSNS
> >     after instruction AFTER.  New BB may be required in case
> >     instruction has EH region attached.  */
> > @@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
> >    rtx vreg = gen_reg_rtx (smode);
> >    df_ref ref;
> >
> > -  for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     start_sequence ();
> > -     if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> > -       {
> > -         rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > -         if (smode == DImode && !TARGET_64BIT)
> > -           {
> > -             emit_move_insn (adjust_address (tmp, SImode, 0),
> > -                             gen_rtx_SUBREG (SImode, reg, 0));
> > -             emit_move_insn (adjust_address (tmp, SImode, 4),
> > -                             gen_rtx_SUBREG (SImode, reg, 4));
> > -           }
> > -         else
> > -           emit_move_insn (copy_rtx (tmp), reg);
> > -         emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > -                                 gen_gpr_to_xmm_move_src (vmode, tmp)));
> > -       }
> > -     else if (!TARGET_64BIT && smode == DImode)
> > -       {
> > -         if (TARGET_SSE4_1)
> > -           {
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 0)));
> > -             emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > -                                           gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > -                                           gen_rtx_SUBREG (SImode, reg, 4),
> > -                                           GEN_INT (2)));
> > -           }
> > -         else
> > -           {
> > -             rtx tmp = gen_reg_rtx (DImode);
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 0)));
> > -             emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> > -                                         CONST0_RTX (V4SImode),
> > -                                         gen_rtx_SUBREG (SImode, reg, 4)));
> > -             emit_insn (gen_vec_interleave_lowv4si
> > -                        (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                         gen_rtx_SUBREG (V4SImode, vreg, 0),
> > -                         gen_rtx_SUBREG (V4SImode, tmp, 0)));
> > -           }
> > -       }
> > -     else
> > -       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > -                               gen_gpr_to_xmm_move_src (vmode, reg)));
> > -     rtx_insn *seq = get_insns ();
> > -     end_sequence ();
> > -     rtx_insn *insn = DF_REF_INSN (ref);
> > -     emit_conversion_insns (seq, insn);
> > -
> > -     if (dump_file)
> > -       fprintf (dump_file,
> > -                "  Copied r%d to a vector register r%d for insn %d\n",
> > -                regno, REGNO (vreg), INSN_UID (insn));
> > -      }
> > -
> > -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     rtx_insn *insn = DF_REF_INSN (ref);
> > -     replace_with_subreg_in_insn (insn, reg, vreg);
> > -
> > -     if (dump_file)
> > -       fprintf (dump_file, "  Replaced r%d with r%d in insn %d\n",
> > -                regno, REGNO (vreg), INSN_UID (insn));
> > -      }
> > -}
> > -
> > -/* Convert all definitions of register REGNO
> > -   and fix its uses.  Scalar copies may be created
> > -   in case register is used in not convertible insn.  */
> > -
> > -void
> > -general_scalar_chain::convert_reg (unsigned regno)
> > -{
> > -  bool scalar_copy = bitmap_bit_p (defs_conv, regno);
> > -  rtx reg = regno_reg_rtx[regno];
> > -  rtx scopy = NULL_RTX;
> > -  df_ref ref;
> > -  bitmap conv;
> > -
> > -  conv = BITMAP_ALLOC (NULL);
> > -  bitmap_copy (conv, insns);
> > -
> > -  if (scalar_copy)
> > -    scopy = gen_reg_rtx (smode);
> > +  defs_map.put (reg, vreg);
> >
> > +  /* For each insn defining REGNO, see if it is defined by an insn
> > +     not part of the chain but with uses in insns part of the chain
> > +     and insert a copy in that case.  */
> >    for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> >      {
> > -      rtx_insn *insn = DF_REF_INSN (ref);
> > -      rtx def_set = single_set (insn);
> > -      rtx src = SET_SRC (def_set);
> > -      rtx reg = DF_REF_REG (ref);
> > +      if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > +     continue;
> > +      df_link *use;
> > +      for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > +     if (!DF_REF_REG_MEM_P (use->ref)
> > +         && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> > +       break;
> > +      if (!use)
> > +     continue;
> >
> > -      if (!MEM_P (src))
> > +      start_sequence ();
> > +      if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
> >       {
> > -       replace_with_subreg_in_insn (insn, reg, reg);
> > -       bitmap_clear_bit (conv, INSN_UID (insn));
> > +       rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > +       if (smode == DImode && !TARGET_64BIT)
> > +         {
> > +           emit_move_insn (adjust_address (tmp, SImode, 0),
> > +                           gen_rtx_SUBREG (SImode, reg, 0));
> > +           emit_move_insn (adjust_address (tmp, SImode, 4),
> > +                           gen_rtx_SUBREG (SImode, reg, 4));
> > +         }
> > +       else
> > +         emit_move_insn (copy_rtx (tmp), reg);
> > +       emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > +                               gen_gpr_to_xmm_move_src (vmode, tmp)));
> >       }
> > -
> > -      if (scalar_copy)
> > +      else if (!TARGET_64BIT && smode == DImode)
> >       {
> > -       start_sequence ();
> > -       if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> > +       if (TARGET_SSE4_1)
> >           {
> > -           rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > -           emit_move_insn (tmp, reg);
> > -           if (!TARGET_64BIT && smode == DImode)
> > -             {
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                               adjust_address (tmp, SImode, 0));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                               adjust_address (tmp, SImode, 4));
> > -             }
> > -           else
> > -             emit_move_insn (scopy, copy_rtx (tmp));
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 0)));
> > +           emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > +                                         gen_rtx_SUBREG (V4SImode, vreg, 
> > 0),
> > +                                         gen_rtx_SUBREG (SImode, reg, 4),
> > +                                         GEN_INT (2)));
> >           }
> > -       else if (!TARGET_64BIT && smode == DImode)
> > +       else
> >           {
> > -           if (TARGET_SSE4_1)
> > -             {
> > -               rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> > -                                           gen_rtvec (1, const0_rtx));
> > -               emit_insn
> > -                 (gen_rtx_SET
> > -                    (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                     gen_rtx_VEC_SELECT (SImode,
> > -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
> > -                                         tmp)));
> > -
> > -               tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, 
> > const1_rtx));
> > -               emit_insn
> > -                 (gen_rtx_SET
> > -                    (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                     gen_rtx_VEC_SELECT (SImode,
> > -                                         gen_rtx_SUBREG (V4SImode, reg, 0),
> > -                                         tmp)));
> > -             }
> > -           else
> > -             {
> > -               rtx vcopy = gen_reg_rtx (V2DImode);
> > -               emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
> > -                               gen_rtx_SUBREG (SImode, vcopy, 0));
> > -               emit_move_insn (vcopy,
> > -                               gen_rtx_LSHIFTRT (V2DImode,
> > -                                                 vcopy, GEN_INT (32)));
> > -               emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
> > -                               gen_rtx_SUBREG (SImode, vcopy, 0));
> > -             }
> > +           rtx tmp = gen_reg_rtx (DImode);
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 0)));
> > +           emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
> > +                                       CONST0_RTX (V4SImode),
> > +                                       gen_rtx_SUBREG (SImode, reg, 4)));
> > +           emit_insn (gen_vec_interleave_lowv4si
> > +                      (gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                       gen_rtx_SUBREG (V4SImode, vreg, 0),
> > +                       gen_rtx_SUBREG (V4SImode, tmp, 0)));
> >           }
> > -       else
> > -         emit_move_insn (scopy, reg);
> > -
> > -       rtx_insn *seq = get_insns ();
> > -       end_sequence ();
> > -       emit_conversion_insns (seq, insn);
> > -
> > -       if (dump_file)
> > -         fprintf (dump_file,
> > -                  "  Copied r%d to a scalar register r%d for insn %d\n",
> > -                  regno, REGNO (scopy), INSN_UID (insn));
> >       }
> > -    }
> > -
> > -  for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
> > -    if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
> > -      {
> > -     if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
> > -       {
> > -         rtx_insn *insn = DF_REF_INSN (ref);
> > +      else
> > +     emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
> > +                             gen_gpr_to_xmm_move_src (vmode, reg)));
> > +      rtx_insn *seq = get_insns ();
> > +      end_sequence ();
> > +      rtx_insn *insn = DF_REF_INSN (ref);
> > +      emit_conversion_insns (seq, insn);
> >
> > -         rtx def_set = single_set (insn);
> > -         gcc_assert (def_set);
> > +      if (dump_file)
> > +     fprintf (dump_file,
> > +              "  Copied r%d to a vector register r%d for insn %d\n",
> > +              regno, REGNO (vreg), INSN_UID (insn));
> > +    }
> > +}
> >
> > -         rtx src = SET_SRC (def_set);
> > -         rtx dst = SET_DEST (def_set);
> > +/* Copy the definition SRC of INSN inside the chain to DST for
> > +   scalar uses outside of the chain.  */
> >
> > -         if (!MEM_P (dst) || !REG_P (src))
> > -           replace_with_subreg_in_insn (insn, reg, reg);
> > +void
> > +general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
> > +{
> > +  start_sequence ();
> > +  if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
> > +    {
> > +      rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
> > +      emit_move_insn (tmp, src);
> > +      if (!TARGET_64BIT && smode == DImode)
> > +     {
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> > +                       adjust_address (tmp, SImode, 0));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> > +                       adjust_address (tmp, SImode, 4));
> > +     }
> > +      else
> > +     emit_move_insn (dst, copy_rtx (tmp));
> > +    }
> > +  else if (!TARGET_64BIT && smode == DImode)
> > +    {
> > +      if (TARGET_SSE4_1)
> > +     {
> > +       rtx tmp = gen_rtx_PARALLEL (VOIDmode,
> > +                                   gen_rtvec (1, const0_rtx));
> > +       emit_insn
> > +           (gen_rtx_SET
> > +            (gen_rtx_SUBREG (SImode, dst, 0),
> > +             gen_rtx_VEC_SELECT (SImode,
> > +                                 gen_rtx_SUBREG (V4SImode, src, 0),
> > +                                 tmp)));
> > +
> > +       tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
> > +       emit_insn
> > +           (gen_rtx_SET
> > +            (gen_rtx_SUBREG (SImode, dst, 4),
> > +             gen_rtx_VEC_SELECT (SImode,
> > +                                 gen_rtx_SUBREG (V4SImode, src, 0),
> > +                                 tmp)));
> > +     }
> > +      else
> > +     {
> > +       rtx vcopy = gen_reg_rtx (V2DImode);
> > +       emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
> > +                       gen_rtx_SUBREG (SImode, vcopy, 0));
> > +       emit_move_insn (vcopy,
> > +                       gen_rtx_LSHIFTRT (V2DImode,
> > +                                         vcopy, GEN_INT (32)));
> > +       emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
> > +                       gen_rtx_SUBREG (SImode, vcopy, 0));
> > +     }
> > +    }
> > +  else
> > +    emit_move_insn (dst, src);
> >
> > -         bitmap_clear_bit (conv, INSN_UID (insn));
> > -       }
> > -      }
> > -    /* Skip debug insns and uninitialized uses.  */
> > -    else if (DF_REF_CHAIN (ref)
> > -          && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
> > -      {
> > -     gcc_assert (scopy);
> > -     replace_rtx (DF_REF_INSN (ref), reg, scopy);
> > -     df_insn_rescan (DF_REF_INSN (ref));
> > -      }
> > +  rtx_insn *seq = get_insns ();
> > +  end_sequence ();
> > +  emit_conversion_insns (seq, insn);
> >
> > -  BITMAP_FREE (conv);
> > +  if (dump_file)
> > +    fprintf (dump_file,
> > +          "  Copied r%d to a scalar register r%d for insn %d\n",
> > +          REGNO (src), REGNO (dst), INSN_UID (insn));
> >  }
> >
> >  /* Convert operand OP in INSN.  We should handle
> > @@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
> >      }
> >    else if (REG_P (*op))
> >      {
> > -      /* We may have not converted register usage in case
> > -      this register has no definition.  Otherwise it
> > -      should be converted in convert_reg.  */
> > -      df_ref ref;
> > -      FOR_EACH_INSN_USE (ref, insn)
> > -     if (DF_REF_REGNO (ref) == REGNO (*op))
> > -       {
> > -         gcc_assert (!DF_REF_CHAIN (ref));
> > -         break;
> > -       }
> >        *op = gen_rtx_SUBREG (vmode, *op, 0);
> >      }
> >    else if (CONST_INT_P (*op))
> > @@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
> >  void
> >  general_scalar_chain::convert_insn (rtx_insn *insn)
> >  {
> > +  /* Generate copies for out-of-chain uses of defs.  */
> > +  for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> > +    if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
> > +      {
> > +     df_link *use;
> > +     for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > +       if (DF_REF_REG_MEM_P (use->ref)
> > +           || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> > +         break;
> > +     if (use)
> > +       convert_reg (insn, DF_REF_REG (ref),
> > +                    *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
> > +      }
> > +
> > +  /* Replace uses in this insn with the defs we use in the chain.  */
> > +  for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> > +    if (!DF_REF_REG_MEM_P (ref))
> > +      if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
> > +     {
> > +       /* Also update a corresponding REG_DEAD note.  */
> > +       rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
> > +       if (note)
> > +         XEXP (note, 0) = *vreg;
> > +       *DF_REF_REAL_LOC (ref) = *vreg;
> > +     }
> > +
> >    rtx def_set = single_set (insn);
> >    rtx src = SET_SRC (def_set);
> >    rtx dst = SET_DEST (def_set);
> > @@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
> >        emit_conversion_insns (gen_move_insn (dst, tmp), insn);
> >        dst = gen_rtx_SUBREG (vmode, tmp, 0);
> >      }
> > +  else if (REG_P (dst))
> > +    {
> > +      /* Replace the definition with a SUBREG to the definition we
> > +         use inside the chain.  */
> > +      rtx *vdef = defs_map.get (dst);
> > +      if (vdef)
> > +     dst = *vdef;
> > +      dst = gen_rtx_SUBREG (vmode, dst, 0);
> > +      /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
> > +         is a non-REG_P.  So kill those off.  */
> > +      rtx note = find_reg_equal_equiv_note (insn);
> > +      if (note)
> > +     remove_note (insn, note);
> > +    }
> >
> >    switch (GET_CODE (src))
> >      {
> > @@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
> >      case COMPARE:
> >        src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
> >
> > -      gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
> > -               || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
> > -
> > -      if (REG_P (src))
> > -     subreg = gen_rtx_SUBREG (V2DImode, src, 0);
> > -      else
> > -     subreg = copy_rtx_if_shared (src);
> > +      gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
> > +      subreg = gen_rtx_SUBREG (V2DImode, src, 0);
> >        emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared 
> > (subreg),
> >                                                   copy_rtx_if_shared 
> > (subreg),
> >                                                   copy_rtx_if_shared 
> > (subreg)),
> >                       insn);
> >        dst = gen_rtx_REG (CCmode, FLAGS_REG);
> > -      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
> > -                                            copy_rtx_if_shared (src)),
> > +      src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared 
> > (subreg),
> > +                                            copy_rtx_if_shared (subreg)),
> >                           UNSPEC_PTEST);
> >        break;
> >
> > @@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
> >    df_insn_rescan (insn);
> >  }
> >
> > +/* Generate copies from defs used by the chain but not defined therein.
> > +   Also populates defs_map which is used later by convert_insn.  */
> > +
> >  void
> >  general_scalar_chain::convert_registers ()
> >  {
> >    bitmap_iterator bi;
> >    unsigned id;
> > -
> > -  EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
> > -    convert_reg (id);
> > -
> > -  EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
> > +  EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
> >      make_vector_copies (id);
> >  }
> >
> > Index: gcc/config/i386/i386-features.h
> > ===================================================================
> > --- gcc/config/i386/i386-features.h   (revision 274843)
> > +++ gcc/config/i386/i386-features.h   (working copy)
> > @@ -171,12 +171,11 @@ class general_scalar_chain : public scal
> >      : scalar_chain (smode_, vmode_) {}
> >    int compute_convert_gain ();
> >   private:
> > +  hash_map<rtx, rtx> defs_map;
> >    void mark_dual_mode_def (df_ref def);
> > -  rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
> > -  void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
> >    void convert_insn (rtx_insn *insn);
> >    void convert_op (rtx *op, rtx_insn *insn);
> > -  void convert_reg (unsigned regno);
> > +  void convert_reg (rtx_insn *insn, rtx dst, rtx src);
> >    void make_vector_copies (unsigned regno);
> >    void convert_registers ();
> >    int vector_const_cost (rtx exp);
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Reply via email to