On Mon, Aug 26, 2019 at 2:49 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Mon, 26 Aug 2019, Uros Bizjak wrote:
>
> > On Mon, Aug 26, 2019 at 2:11 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Mon, Aug 26, 2019 at 1:46 PM Richard Biener <rguent...@suse.de> wrote:
> > > >
> > > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > > >
> > > > > On Fri, Aug 23, 2019 at 1:52 PM Richard Biener <rguent...@suse.de> 
> > > > > wrote:
> > > > > >
> > > > > > On Fri, 23 Aug 2019, Uros Bizjak wrote:
> > > > > >
> > > > > > > This is currently a heads-up patch that removes the minimum 
> > > > > > > limitation
> > > > > > > of cost of moves to/from XMM reg. The immediate benefit is the 
> > > > > > > removal
> > > > > > > of mismatched spills, caused by subreg usage.
> > > > > > >
> > > > > > > *If* the patch proves to be beneficial (as in "doesn't regress
> > > > > > > important benchmarks"), then we should be able to un-hide the
> > > > > > > inter-regset moves from RA and allow it to collapse some moves. 
> > > > > > > As an
> > > > > > > example, patched compiler removes a movd in 
> > > > > > > gcc.target/i386/minmax-6.c
> > > > > > > and still avoids mismatched spill.
> > > > > > >
> > > > > > > 2019-08-23  Uroš Bizjak  <ubiz...@gmail.com>
> > > > > > >
> > > > > > >     * config/i386/i386.c (ix86_register_move_cost): Do not
> > > > > > >     limit the cost of moves to/from XMM register to minimum 8.
> > > > > > >     * config/i386/i386-features.c
> > > > > > >     (general_scalar_chain::make_vector_copies): Do not generate
> > > > > > >     zeroing move from GPR to XMM register, use gen_move_insn
> > > > > > >     instead of gen_gpr_to_xmm_move_src.
> > > > > > >     (general_scalar_chain::convert_op): Ditto.
> > > > > > >     (gen_gpr_to_xmm_move_src): Remove.
> > > > > > >
> > > > > > > The patch was bootstrapped and regression tested on 
> > > > > > > x86_64-linux-gnu
> > > > > > > {,-m32}, configured w/ and w/o -with-arch=ivybridge.
> > > > > > >
> > > > > > > The patch regresses PR80481 scan-asm-not (where the compiler 
> > > > > > > generates
> > > > > > > unrelated XMM spill on register starved x86_32). However, during 
> > > > > > > the
> > > > > > > analysis, I found that the original issue is not fixed, and is 
> > > > > > > still
> > > > > > > visible without -funrol-loops [1].
> > > > > > >
> > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481#c10
> > > > > > >
> > > > > > > So, I'd wait for the HJ's benchmark results of the cost to/from 
> > > > > > > XMM
> > > > > > > change before proceeding with the patch.
> > > > > >
> > > > > > Otherwise looks good to me, it might clash (whitespace wise)
> > > > > > with my STV "rewrite" though.
> > > > > >
> > > > > > We might want to adjust ix86_register_move_cost separately from
> > > > > > the STV change to use regular moves though?  Just to make bisection
> > > > > > point to either of the two - STV "fallout"" is probably minor
> > > > > > compared to fallout elsewhere...
> > > > >
> > > > > Yes, this is also my plan.
> > > >
> > > > Btw, when testing w/ the costing disabled I run into
> > > >
> > > > (insn 31 7 8 2 (set (subreg:V4SI (reg:SI 107) 0)
> > > >         (vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (const:DI (plus:DI
> > > > (symbol_ref:DI ("peakbuf") [flags 0x2]  <var_decl 0x7efe8dbfeab0 
> > > > peakbuf>)
> > > >                             (const_int 400020000 [0x17d7d220]))) [1
> > > > peakbuf.PeaksInBuf+0 S4 A64]))
> > > >             (const_vector:V4SI [
> > > >                     (const_int 0 [0]) repeated x4
> > > >                 ])
> > > >             (const_int 1 [0x1])))
> > > >
> > > > being not recognized (for the large immediate I guess).  So when
> > > > just doing
> > >
> > > Hard to say without testcase, but I don't think this is valid memory
> > > address. Can you see in dumps which insn is getting converted here?
> >
> > Ah, we can use movabsdi in this case.
>
> Yes, this is simply STVing such a single load... (quite pointless
> of course).
>
> > > >   (set (reg:SI 107) (mem:SI ...))
> > > >
> > > > here we expect IRA to be able to allocate 107 into xmm, realizing
> > > > it needs to reload the RHS first?
> > > >
> > > > For current code, is testing x86_64_general_operand like in the
> > > > following the correct thing to do?
> > >
> > > x86_64_general_operand will only limit immediates using
> > > x86_64_immediate_operand, it will allow all memory_operands.
> >
> > Yes, I guess it is OK, invalid embedded address can be loaded using movabsq.
>
> OK, combining this with your patch for extended testing now.

Please note that this new code will be removed by the above patch. The
failing pattern will be replaced with simple move to SUBREG, where RA
is able to correctly reload the operands.

Uros.

Reply via email to