> On 26 Aug 2019, at 11:06, Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, Aug 26, 2019 at 10:40 AM Richard Biener <rguent...@suse.de> wrote:
>>
>> On Fri, 23 Aug 2019, Richard Biener wrote:
>>
>>> On Fri, 23 Aug 2019, Richard Biener wrote:
>>>
>>>> On Fri, 23 Aug 2019, Uros Bizjak wrote:
>>>>
>>>>> On Thu, Aug 22, 2019 at 3:35 PM Richard Biener <rguent...@suse.de> wrote:
>>>>>>
>>>>>>
>>>>>> This fixes quadraticness in STV and makes
>>>>>>
>>>>>> machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 (
>>>>>> 95%) 54 kB ( 0%)
>>>>>>
>>>>>> drop to zero. Anybody remembers why it is the way it is now?
>>>>>>
>>>>>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>>>>>
>>>>>> OK?
>>>>>
>>>>> Looking at the PR, comment #3 [1], I assume this patch is obsoltete
>>>>> and will be updated?
>>>>>
>>>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3
>>>>
>>>> Yes. I'm still learning how STV operates (and learing DF and RTL...).
>>>> The following is a rewrite of the non-TImode chain conversion
>>>> according to I think how it should operate als allowing the hunk
>>>> that fixes the compile-time and fixing PR91527 on the way
>>>> (which I ran into during more extensive testing of the patch myself).
>>>>
>>>> So compared to the state before which I still do not 100% understand
>>>> we now do the following. Chain detection works as before including
>>>> recording of all defs (both defined by the insns in the chain and
>>>> insns outside) that need copy-in or copy-out operations.
>>>>
>>>> But then the patch changes things as to guarantee that
>>>> after the conversion all uses/defs of a pseudo are
>>>> of the (subreg:Vmode ..) form or of the original scalar form.
>>>> In particular it avoids the need to change any insns that
>>>> are not part of the chain (besides emitting the extra copy
>>>> instructions). For this all defs that were marked as needing
>>>> copies (thus they have uses/defs both in the chain and outside)
>>>> the chain will use a new pseudo that we copy to from scalar sources
>>>> and that we copy from for scalar uses. There's the new defs_map
>>>> which records the mapping of old to new reg. pseudos that are
>>>> only used in the chain already are not remapped.
>>>>
>>>> The conversion itself then happens in two stages, first,
>>>> in make_vector_copies, we emit the copy-in insns and
>>>> allocate all pseudos we need. Then the rest of the
>>>> conversion happens fully inside of convert_insn where
>>>> we generate the copy-outs of the insns def, replace
>>>> defs and uses according to the mapping and replace uses
>>>> and defs with the (subreg:Vmode ..) style.
>>>>
>>>> For PR91527 IRA doesn't like the REG_EQUIV note in
>>>>
>>>> (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0)
>>>> (subreg:V4SI (reg:SI 100) 0))
>>>> "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4
>>>> 1248 {movv4si_internal}
>>>> (expr_list:REG_DEAD (reg:SI 100)
>>>> (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp)
>>>> (const_int 16 [0x10])) [1 c+0 S4 A64])
>>>> (nil))))
>>>>
>>>> because the SET_DEST is not a REG_P. I'm not sure if this
>>>> is invalid RTL, docs say SET_DEST might be a strict_low_part
>>>> or a zero_extract but doesn't mention a subreg. So I opted
>>>> to simply remove equal/equiv notes on insns we convert
>>>> and since the above has a REG_DEAD note I took the liberty
>>>> to update that according to the mapping (so that would have
>>>> been not needed before this patch) rather than dropping it.
>>>>
>>>> Bootstrapped with and without --with-march=westmere (to get
>>>> some STV coverage, this included all languages) on
>>>> x88_64-unknown-linux-gnu, testing in progress.
>>>>
>>>> OK if testing succeeds?
>>>
>>> Testing revealed I made an error in general_scalar_chain::convert_insn
>>> failing to move down SET_SRC extraction below replacing with
>>> the defs map. This showed in 4 execute FAILs in 32bit fortran
>>> testing (only). Fixed by moving down the whole def_set/src/dst
>>> extraction.
>>>
>>> Re-testing on x86_64-unknown-linux-gnu.
>>
>> Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel"
>> run runs into the latent PR91528 building 32bit libada in stage3
>> for a few sources, I've manually built those with -mno-stv and
>> bootstrap finishes successfully. I hope HJ can help with this
>> dynamic stack-alignment issue.
>>
>> So - OK for trunk?
>>
>> As followup we can now remove general_remove_non_convertible_regs
>> because we can handle defs that cannot be converted just fine
>> with the patch since we're splitting live-ranges of all defs at
>> the chain boundary.
>>
>> Thanks,
>> Richard.
>>
>>> Updated patch below. I'm feeling adventurous and will run
>>> the "westmere" bootstrap with costing disabled (aka always
>>> convert detected chains...).
>>>
>>> Richard.
>>>
>>> 2019-08-23 Richard Biener <rguent...@suse.de>
>>>
>>> PR target/91522
>>> PR target/91527
>>> * config/i386/i386-features.h (general_scalar_chain::defs_map):
>>> New member.
>>> (general_scalar_chain::replace_with_subreg): Remove.
>>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>>> (general_scalar_chain::convert_reg): Adjust signature.
>>> * config/i386/i386-features.c (scalar_chain::add_insn): Do not
>>> iterate over all defs of a reg.
>>> (general_scalar_chain::replace_with_subreg): Remove.
>>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise.
>>> (general_scalar_chain::make_vector_copies): Populate defs_map,
>>> place copy only after defs that are used as vectors in the chain.
>>> (general_scalar_chain::convert_reg): Emit a copy for a specific
>>> def in a specific instruction.
>>> (general_scalar_chain::convert_op): All reg uses are converted here.
>>> (general_scalar_chain::convert_insn): Emit copies for scalar
>>> uses of defs here. Replace uses with the copies we created.
>>> Replace and convert the def. Adjust REG_DEAD notes, remove
>>> REG_EQUIV/EQUAL notes.
>>> (general_scalar_chain::convert_registers): Only handle copies
>>> into the chain here.
>
> Rubberstamped with LGTM. It looks you are the master of this domain now ;)
This breaks bootstrap for i686-darwin (and most likely is the cause of
bootstrap fail on
i686-linux i686-linux-gnu at
https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html
et al)
It gives a bunch of compare errors spread around the tree - so no specific
pointer there.
There’s a secondary fail overlaying it between 274933-or so and 274980 which
confused
my initial search.
Iain
>
> Thanks,
> Uros.
>
>>>
>>>
>>> Index: gcc/config/i386/i386-features.c
>>> ===================================================================
>>> --- gcc/config/i386/i386-features.c (revision 274843)
>>> +++ gcc/config/i386/i386-features.c (working copy)
>>> @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate
>>> iterates over all refs to look for dual-mode regs. Instead this
>>> should be done separately for all regs mentioned in the chain once. */
>>> df_ref ref;
>>> - df_ref def;
>>> for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>>> if (!HARD_REGISTER_P (DF_REF_REG (ref)))
>>> - for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref));
>>> - def;
>>> - def = DF_REF_NEXT_REG (def))
>>> - analyze_register_chain (candidates, def);
>>> + analyze_register_chain (candidates, ref);
>>> for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
>>> if (!DF_REF_REG_MEM_P (ref))
>>> analyze_register_chain (candidates, ref);
>>> @@ -605,42 +601,6 @@ general_scalar_chain::compute_convert_ga
>>> return gain;
>>> }
>>>
>>> -/* Replace REG in X with a V2DI subreg of NEW_REG. */
>>> -
>>> -rtx
>>> -general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg)
>>> -{
>>> - if (x == reg)
>>> - return gen_rtx_SUBREG (vmode, new_reg, 0);
>>> -
>>> - /* But not in memory addresses. */
>>> - if (MEM_P (x))
>>> - return x;
>>> -
>>> - const char *fmt = GET_RTX_FORMAT (GET_CODE (x));
>>> - int i, j;
>>> - for (i = GET_RTX_LENGTH (GET_CODE (x)) - 1; i >= 0; i--)
>>> - {
>>> - if (fmt[i] == 'e')
>>> - XEXP (x, i) = replace_with_subreg (XEXP (x, i), reg, new_reg);
>>> - else if (fmt[i] == 'E')
>>> - for (j = XVECLEN (x, i) - 1; j >= 0; j--)
>>> - XVECEXP (x, i, j) = replace_with_subreg (XVECEXP (x, i, j),
>>> - reg, new_reg);
>>> - }
>>> -
>>> - return x;
>>> -}
>>> -
>>> -/* Replace REG in INSN with a V2DI subreg of NEW_REG. */
>>> -
>>> -void
>>> -general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn,
>>> - rtx reg, rtx new_reg)
>>> -{
>>> - replace_with_subreg (single_set (insn), reg, new_reg);
>>> -}
>>> -
>>> /* Insert generated conversion instruction sequence INSNS
>>> after instruction AFTER. New BB may be required in case
>>> instruction has EH region attached. */
>>> @@ -691,204 +651,147 @@ general_scalar_chain::make_vector_copies
>>> rtx vreg = gen_reg_rtx (smode);
>>> df_ref ref;
>>>
>>> - for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> - if (!bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> - {
>>> - start_sequence ();
>>> - if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>>> - {
>>> - rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> - if (smode == DImode && !TARGET_64BIT)
>>> - {
>>> - emit_move_insn (adjust_address (tmp, SImode, 0),
>>> - gen_rtx_SUBREG (SImode, reg, 0));
>>> - emit_move_insn (adjust_address (tmp, SImode, 4),
>>> - gen_rtx_SUBREG (SImode, reg, 4));
>>> - }
>>> - else
>>> - emit_move_insn (copy_rtx (tmp), reg);
>>> - emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> - gen_gpr_to_xmm_move_src (vmode, tmp)));
>>> - }
>>> - else if (!TARGET_64BIT && smode == DImode)
>>> - {
>>> - if (TARGET_SSE4_1)
>>> - {
>>> - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> - CONST0_RTX (V4SImode),
>>> - gen_rtx_SUBREG (SImode, reg, 0)));
>>> - emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> - gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> - gen_rtx_SUBREG (SImode, reg, 4),
>>> - GEN_INT (2)));
>>> - }
>>> - else
>>> - {
>>> - rtx tmp = gen_reg_rtx (DImode);
>>> - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> - CONST0_RTX (V4SImode),
>>> - gen_rtx_SUBREG (SImode, reg, 0)));
>>> - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
>>> - CONST0_RTX (V4SImode),
>>> - gen_rtx_SUBREG (SImode, reg, 4)));
>>> - emit_insn (gen_vec_interleave_lowv4si
>>> - (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> - gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> - gen_rtx_SUBREG (V4SImode, tmp, 0)));
>>> - }
>>> - }
>>> - else
>>> - emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> - gen_gpr_to_xmm_move_src (vmode, reg)));
>>> - rtx_insn *seq = get_insns ();
>>> - end_sequence ();
>>> - rtx_insn *insn = DF_REF_INSN (ref);
>>> - emit_conversion_insns (seq, insn);
>>> -
>>> - if (dump_file)
>>> - fprintf (dump_file,
>>> - " Copied r%d to a vector register r%d for insn %d\n",
>>> - regno, REGNO (vreg), INSN_UID (insn));
>>> - }
>>> -
>>> - for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> - if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> - {
>>> - rtx_insn *insn = DF_REF_INSN (ref);
>>> - replace_with_subreg_in_insn (insn, reg, vreg);
>>> -
>>> - if (dump_file)
>>> - fprintf (dump_file, " Replaced r%d with r%d in insn %d\n",
>>> - regno, REGNO (vreg), INSN_UID (insn));
>>> - }
>>> -}
>>> -
>>> -/* Convert all definitions of register REGNO
>>> - and fix its uses. Scalar copies may be created
>>> - in case register is used in not convertible insn. */
>>> -
>>> -void
>>> -general_scalar_chain::convert_reg (unsigned regno)
>>> -{
>>> - bool scalar_copy = bitmap_bit_p (defs_conv, regno);
>>> - rtx reg = regno_reg_rtx[regno];
>>> - rtx scopy = NULL_RTX;
>>> - df_ref ref;
>>> - bitmap conv;
>>> -
>>> - conv = BITMAP_ALLOC (NULL);
>>> - bitmap_copy (conv, insns);
>>> -
>>> - if (scalar_copy)
>>> - scopy = gen_reg_rtx (smode);
>>> + defs_map.put (reg, vreg);
>>>
>>> + /* For each insn defining REGNO, see if it is defined by an insn
>>> + not part of the chain but with uses in insns part of the chain
>>> + and insert a copy in that case. */
>>> for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> {
>>> - rtx_insn *insn = DF_REF_INSN (ref);
>>> - rtx def_set = single_set (insn);
>>> - rtx src = SET_SRC (def_set);
>>> - rtx reg = DF_REF_REG (ref);
>>> + if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> + continue;
>>> + df_link *use;
>>> + for (use = DF_REF_CHAIN (ref); use; use = use->next)
>>> + if (!DF_REF_REG_MEM_P (use->ref)
>>> + && bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
>>> + break;
>>> + if (!use)
>>> + continue;
>>>
>>> - if (!MEM_P (src))
>>> + start_sequence ();
>>> + if (!TARGET_INTER_UNIT_MOVES_TO_VEC)
>>> {
>>> - replace_with_subreg_in_insn (insn, reg, reg);
>>> - bitmap_clear_bit (conv, INSN_UID (insn));
>>> + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> + if (smode == DImode && !TARGET_64BIT)
>>> + {
>>> + emit_move_insn (adjust_address (tmp, SImode, 0),
>>> + gen_rtx_SUBREG (SImode, reg, 0));
>>> + emit_move_insn (adjust_address (tmp, SImode, 4),
>>> + gen_rtx_SUBREG (SImode, reg, 4));
>>> + }
>>> + else
>>> + emit_move_insn (copy_rtx (tmp), reg);
>>> + emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> + gen_gpr_to_xmm_move_src (vmode, tmp)));
>>> }
>>> -
>>> - if (scalar_copy)
>>> + else if (!TARGET_64BIT && smode == DImode)
>>> {
>>> - start_sequence ();
>>> - if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
>>> + if (TARGET_SSE4_1)
>>> {
>>> - rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> - emit_move_insn (tmp, reg);
>>> - if (!TARGET_64BIT && smode == DImode)
>>> - {
>>> - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
>>> - adjust_address (tmp, SImode, 0));
>>> - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
>>> - adjust_address (tmp, SImode, 4));
>>> - }
>>> - else
>>> - emit_move_insn (scopy, copy_rtx (tmp));
>>> + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> + CONST0_RTX (V4SImode),
>>> + gen_rtx_SUBREG (SImode, reg, 0)));
>>> + emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> + gen_rtx_SUBREG (V4SImode, vreg,
>>> 0),
>>> + gen_rtx_SUBREG (SImode, reg, 4),
>>> + GEN_INT (2)));
>>> }
>>> - else if (!TARGET_64BIT && smode == DImode)
>>> + else
>>> {
>>> - if (TARGET_SSE4_1)
>>> - {
>>> - rtx tmp = gen_rtx_PARALLEL (VOIDmode,
>>> - gen_rtvec (1, const0_rtx));
>>> - emit_insn
>>> - (gen_rtx_SET
>>> - (gen_rtx_SUBREG (SImode, scopy, 0),
>>> - gen_rtx_VEC_SELECT (SImode,
>>> - gen_rtx_SUBREG (V4SImode, reg, 0),
>>> - tmp)));
>>> -
>>> - tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1,
>>> const1_rtx));
>>> - emit_insn
>>> - (gen_rtx_SET
>>> - (gen_rtx_SUBREG (SImode, scopy, 4),
>>> - gen_rtx_VEC_SELECT (SImode,
>>> - gen_rtx_SUBREG (V4SImode, reg, 0),
>>> - tmp)));
>>> - }
>>> - else
>>> - {
>>> - rtx vcopy = gen_reg_rtx (V2DImode);
>>> - emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
>>> - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
>>> - gen_rtx_SUBREG (SImode, vcopy, 0));
>>> - emit_move_insn (vcopy,
>>> - gen_rtx_LSHIFTRT (V2DImode,
>>> - vcopy, GEN_INT (32)));
>>> - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
>>> - gen_rtx_SUBREG (SImode, vcopy, 0));
>>> - }
>>> + rtx tmp = gen_reg_rtx (DImode);
>>> + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> + CONST0_RTX (V4SImode),
>>> + gen_rtx_SUBREG (SImode, reg, 0)));
>>> + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0),
>>> + CONST0_RTX (V4SImode),
>>> + gen_rtx_SUBREG (SImode, reg, 4)));
>>> + emit_insn (gen_vec_interleave_lowv4si
>>> + (gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> + gen_rtx_SUBREG (V4SImode, vreg, 0),
>>> + gen_rtx_SUBREG (V4SImode, tmp, 0)));
>>> }
>>> - else
>>> - emit_move_insn (scopy, reg);
>>> -
>>> - rtx_insn *seq = get_insns ();
>>> - end_sequence ();
>>> - emit_conversion_insns (seq, insn);
>>> -
>>> - if (dump_file)
>>> - fprintf (dump_file,
>>> - " Copied r%d to a scalar register r%d for insn %d\n",
>>> - regno, REGNO (scopy), INSN_UID (insn));
>>> }
>>> - }
>>> -
>>> - for (ref = DF_REG_USE_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref))
>>> - if (bitmap_bit_p (insns, DF_REF_INSN_UID (ref)))
>>> - {
>>> - if (bitmap_bit_p (conv, DF_REF_INSN_UID (ref)))
>>> - {
>>> - rtx_insn *insn = DF_REF_INSN (ref);
>>> + else
>>> + emit_insn (gen_rtx_SET (gen_rtx_SUBREG (vmode, vreg, 0),
>>> + gen_gpr_to_xmm_move_src (vmode, reg)));
>>> + rtx_insn *seq = get_insns ();
>>> + end_sequence ();
>>> + rtx_insn *insn = DF_REF_INSN (ref);
>>> + emit_conversion_insns (seq, insn);
>>>
>>> - rtx def_set = single_set (insn);
>>> - gcc_assert (def_set);
>>> + if (dump_file)
>>> + fprintf (dump_file,
>>> + " Copied r%d to a vector register r%d for insn %d\n",
>>> + regno, REGNO (vreg), INSN_UID (insn));
>>> + }
>>> +}
>>>
>>> - rtx src = SET_SRC (def_set);
>>> - rtx dst = SET_DEST (def_set);
>>> +/* Copy the definition SRC of INSN inside the chain to DST for
>>> + scalar uses outside of the chain. */
>>>
>>> - if (!MEM_P (dst) || !REG_P (src))
>>> - replace_with_subreg_in_insn (insn, reg, reg);
>>> +void
>>> +general_scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx src)
>>> +{
>>> + start_sequence ();
>>> + if (!TARGET_INTER_UNIT_MOVES_FROM_VEC)
>>> + {
>>> + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP);
>>> + emit_move_insn (tmp, src);
>>> + if (!TARGET_64BIT && smode == DImode)
>>> + {
>>> + emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
>>> + adjust_address (tmp, SImode, 0));
>>> + emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
>>> + adjust_address (tmp, SImode, 4));
>>> + }
>>> + else
>>> + emit_move_insn (dst, copy_rtx (tmp));
>>> + }
>>> + else if (!TARGET_64BIT && smode == DImode)
>>> + {
>>> + if (TARGET_SSE4_1)
>>> + {
>>> + rtx tmp = gen_rtx_PARALLEL (VOIDmode,
>>> + gen_rtvec (1, const0_rtx));
>>> + emit_insn
>>> + (gen_rtx_SET
>>> + (gen_rtx_SUBREG (SImode, dst, 0),
>>> + gen_rtx_VEC_SELECT (SImode,
>>> + gen_rtx_SUBREG (V4SImode, src, 0),
>>> + tmp)));
>>> +
>>> + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
>>> + emit_insn
>>> + (gen_rtx_SET
>>> + (gen_rtx_SUBREG (SImode, dst, 4),
>>> + gen_rtx_VEC_SELECT (SImode,
>>> + gen_rtx_SUBREG (V4SImode, src, 0),
>>> + tmp)));
>>> + }
>>> + else
>>> + {
>>> + rtx vcopy = gen_reg_rtx (V2DImode);
>>> + emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, src, 0));
>>> + emit_move_insn (gen_rtx_SUBREG (SImode, dst, 0),
>>> + gen_rtx_SUBREG (SImode, vcopy, 0));
>>> + emit_move_insn (vcopy,
>>> + gen_rtx_LSHIFTRT (V2DImode,
>>> + vcopy, GEN_INT (32)));
>>> + emit_move_insn (gen_rtx_SUBREG (SImode, dst, 4),
>>> + gen_rtx_SUBREG (SImode, vcopy, 0));
>>> + }
>>> + }
>>> + else
>>> + emit_move_insn (dst, src);
>>>
>>> - bitmap_clear_bit (conv, INSN_UID (insn));
>>> - }
>>> - }
>>> - /* Skip debug insns and uninitialized uses. */
>>> - else if (DF_REF_CHAIN (ref)
>>> - && NONDEBUG_INSN_P (DF_REF_INSN (ref)))
>>> - {
>>> - gcc_assert (scopy);
>>> - replace_rtx (DF_REF_INSN (ref), reg, scopy);
>>> - df_insn_rescan (DF_REF_INSN (ref));
>>> - }
>>> + rtx_insn *seq = get_insns ();
>>> + end_sequence ();
>>> + emit_conversion_insns (seq, insn);
>>>
>>> - BITMAP_FREE (conv);
>>> + if (dump_file)
>>> + fprintf (dump_file,
>>> + " Copied r%d to a scalar register r%d for insn %d\n",
>>> + REGNO (src), REGNO (dst), INSN_UID (insn));
>>> }
>>>
>>> /* Convert operand OP in INSN. We should handle
>>> @@ -921,16 +824,6 @@ general_scalar_chain::convert_op (rtx *o
>>> }
>>> else if (REG_P (*op))
>>> {
>>> - /* We may have not converted register usage in case
>>> - this register has no definition. Otherwise it
>>> - should be converted in convert_reg. */
>>> - df_ref ref;
>>> - FOR_EACH_INSN_USE (ref, insn)
>>> - if (DF_REF_REGNO (ref) == REGNO (*op))
>>> - {
>>> - gcc_assert (!DF_REF_CHAIN (ref));
>>> - break;
>>> - }
>>> *op = gen_rtx_SUBREG (vmode, *op, 0);
>>> }
>>> else if (CONST_INT_P (*op))
>>> @@ -975,6 +868,32 @@ general_scalar_chain::convert_op (rtx *o
>>> void
>>> general_scalar_chain::convert_insn (rtx_insn *insn)
>>> {
>>> + /* Generate copies for out-of-chain uses of defs. */
>>> + for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
>>> + if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
>>> + {
>>> + df_link *use;
>>> + for (use = DF_REF_CHAIN (ref); use; use = use->next)
>>> + if (DF_REF_REG_MEM_P (use->ref)
>>> + || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
>>> + break;
>>> + if (use)
>>> + convert_reg (insn, DF_REF_REG (ref),
>>> + *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
>>> + }
>>> +
>>> + /* Replace uses in this insn with the defs we use in the chain. */
>>> + for (df_ref ref = DF_INSN_USES (insn); ref; ref = DF_REF_NEXT_LOC (ref))
>>> + if (!DF_REF_REG_MEM_P (ref))
>>> + if (rtx *vreg = defs_map.get (regno_reg_rtx[DF_REF_REGNO (ref)]))
>>> + {
>>> + /* Also update a corresponding REG_DEAD note. */
>>> + rtx note = find_reg_note (insn, REG_DEAD, DF_REF_REG (ref));
>>> + if (note)
>>> + XEXP (note, 0) = *vreg;
>>> + *DF_REF_REAL_LOC (ref) = *vreg;
>>> + }
>>> +
>>> rtx def_set = single_set (insn);
>>> rtx src = SET_SRC (def_set);
>>> rtx dst = SET_DEST (def_set);
>>> @@ -988,6 +907,20 @@ general_scalar_chain::convert_insn (rtx_
>>> emit_conversion_insns (gen_move_insn (dst, tmp), insn);
>>> dst = gen_rtx_SUBREG (vmode, tmp, 0);
>>> }
>>> + else if (REG_P (dst))
>>> + {
>>> + /* Replace the definition with a SUBREG to the definition we
>>> + use inside the chain. */
>>> + rtx *vdef = defs_map.get (dst);
>>> + if (vdef)
>>> + dst = *vdef;
>>> + dst = gen_rtx_SUBREG (vmode, dst, 0);
>>> + /* IRA doesn't like to have REG_EQUAL/EQUIV notes when the SET_DEST
>>> + is a non-REG_P. So kill those off. */
>>> + rtx note = find_reg_equal_equiv_note (insn);
>>> + if (note)
>>> + remove_note (insn, note);
>>> + }
>>>
>>> switch (GET_CODE (src))
>>> {
>>> @@ -1045,20 +978,15 @@ general_scalar_chain::convert_insn (rtx_
>>> case COMPARE:
>>> src = SUBREG_REG (XEXP (XEXP (src, 0), 0));
>>>
>>> - gcc_assert ((REG_P (src) && GET_MODE (src) == DImode)
>>> - || (SUBREG_P (src) && GET_MODE (src) == V2DImode));
>>> -
>>> - if (REG_P (src))
>>> - subreg = gen_rtx_SUBREG (V2DImode, src, 0);
>>> - else
>>> - subreg = copy_rtx_if_shared (src);
>>> + gcc_assert (REG_P (src) && GET_MODE (src) == DImode);
>>> + subreg = gen_rtx_SUBREG (V2DImode, src, 0);
>>> emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared
>>> (subreg),
>>> copy_rtx_if_shared
>>> (subreg),
>>> copy_rtx_if_shared
>>> (subreg)),
>>> insn);
>>> dst = gen_rtx_REG (CCmode, FLAGS_REG);
>>> - src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (src),
>>> - copy_rtx_if_shared (src)),
>>> + src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared
>>> (subreg),
>>> + copy_rtx_if_shared (subreg)),
>>> UNSPEC_PTEST);
>>> break;
>>>
>>> @@ -1217,16 +1145,15 @@ timode_scalar_chain::convert_insn (rtx_i
>>> df_insn_rescan (insn);
>>> }
>>>
>>> +/* Generate copies from defs used by the chain but not defined therein.
>>> + Also populates defs_map which is used later by convert_insn. */
>>> +
>>> void
>>> general_scalar_chain::convert_registers ()
>>> {
>>> bitmap_iterator bi;
>>> unsigned id;
>>> -
>>> - EXECUTE_IF_SET_IN_BITMAP (defs, 0, id, bi)
>>> - convert_reg (id);
>>> -
>>> - EXECUTE_IF_AND_COMPL_IN_BITMAP (defs_conv, defs, 0, id, bi)
>>> + EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, id, bi)
>>> make_vector_copies (id);
>>> }
>>>
>>> Index: gcc/config/i386/i386-features.h
>>> ===================================================================
>>> --- gcc/config/i386/i386-features.h (revision 274843)
>>> +++ gcc/config/i386/i386-features.h (working copy)
>>> @@ -171,12 +171,11 @@ class general_scalar_chain : public scal
>>> : scalar_chain (smode_, vmode_) {}
>>> int compute_convert_gain ();
>>> private:
>>> + hash_map<rtx, rtx> defs_map;
>>> void mark_dual_mode_def (df_ref def);
>>> - rtx replace_with_subreg (rtx x, rtx reg, rtx subreg);
>>> - void replace_with_subreg_in_insn (rtx_insn *insn, rtx reg, rtx subreg);
>>> void convert_insn (rtx_insn *insn);
>>> void convert_op (rtx *op, rtx_insn *insn);
>>> - void convert_reg (unsigned regno);
>>> + void convert_reg (rtx_insn *insn, rtx dst, rtx src);
>>> void make_vector_copies (unsigned regno);
>>> void convert_registers ();
>>> int vector_const_cost (rtx exp);
>>>
>>
>> --
>> Richard Biener <rguent...@suse.de>
>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
>> Germany; GF: Felix Imendörffer; HRB 247165 (AG München)