On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich <enkovich....@gmail.com> wrote: > Hi, > > Having stage1 close to end, may we make some decision regarding this > patch? Having a couple of working variants, may we choose and use one > of them?
I propose to wait for Vlad for an update about his plans on register preference algorythm that would fix this (and other "Ya*r"-type issues). In the absence of the fix, we'll go with "Yr,*x". Uros. > > Thanks, > Ilya > > 2014-07-15 17:38 GMT+04:00 Uros Bizjak <ubiz...@gmail.com>: >> On Tue, Jul 15, 2014 at 1:01 PM, Ilya Enkovich <enkovich....@gmail.com> >> wrote: >>> On 15 Jul 10:42, Uros Bizjak wrote: >>>> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <enkovich....@gmail.com> >>>> wrote: >>>> >>>> >>>>> Also fully restrict xmm8-15 does not seem right. It is just costly >>>> >>>>> but not fully disallowed. >>>> >>>> >>>> >>>> As said earlier, you can try "Ya*x" as a constraint. >>>> >>> >>>> >>> I tried it. It does not seem to affect allocation much. I do not see >>>> >>> any gain on targeted tests. >>>> >> >>>> >> Strange, because the documentation claims: >>>> >> >>>> >> '*' >>>> >> Says that the following character should be ignored when choosing >>>> >> register preferences. '*' has no effect on the meaning of the >>>> >> constraint as a constraint, and no effect on reloading. For LRA >>>> >> '*' additionally disparages slightly the alternative if the >>>> >> following character matches the operand. >>>> >> >>>> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's >>>> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was >>>> >> needed for some MMX alternatives, where we didn't want RA to allocate >>>> >> a MMX register when the value could be passed in integer regs, but the >>>> >> value was still allowed in MMX register. >>>> > >>>> > That's is what my patch already does, but with '?' instead of '!'. >>>> >>>> Yes, I know. The problem is, that Ya*x type conditional allocation >>>> worked OK in the past for "not preferred, but still alowed regclass" >>>> registers, There are several patterns in i386.md that live by this >>>> premise, including movsf_internal and movdf_internal. If this approach >>>> doesn't work anymore, then we have to either figure out what is the >>>> reason, or invent a new strategy that will be applicable to all cases. >>>> >>>> Can you please post a small test that illustrates the case where Ya,!x >>>> works, but Ya*x doesn't? >>> >>> It's hard to compose a small testcase which will have SSE4 instructions >>> generated with required register usage. I use tcpjumbo test from TCPmark >>> for initial check of how my patch works. This test has a lot of pmovzxwd >>> instructions generated and many of them use xmm8-15. I tried two versions >>> of a simple patch which modifies only pmovzxwd instruction. >>> >>> Patch1: >>> >>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >>> index d907353..6b03b72 100644 >>> --- a/gcc/config/i386/sse.md >>> +++ b/gcc/config/i386/sse.md >>> @@ -11852,10 +11852,10 @@ >>> (set_attr "mode" "OI")]) >>> >>> (define_insn "sse4_1_<code>v4hiv4si2" >>> - [(set (match_operand:V4SI 0 "register_operand" "=x") >>> + [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x") >>> (any_extend:V4SI >>> (vec_select:V4HI >>> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >>> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm") >>> (parallel [(const_int 0) (const_int 1) >>> (const_int 2) (const_int 3)]))))] >>> "TARGET_SSE4_1" >>> >>> Patch2: >>> >>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md >>> index d907353..b3721c4 100644 >>> --- a/gcc/config/i386/sse.md >>> +++ b/gcc/config/i386/sse.md >>> @@ -11852,10 +11852,10 @@ >>> (set_attr "mode" "OI")]) >>> >>> (define_insn "sse4_1_<code>v4hiv4si2" >>> - [(set (match_operand:V4SI 0 "register_operand" "=x") >>> + [(set (match_operand:V4SI 0 "register_operand" "=Yr*x") >>> (any_extend:V4SI >>> (vec_select:V4HI >>> - (match_operand:V8HI 1 "nonimmediate_operand" "xm") >>> + (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm") >>> (parallel [(const_int 0) (const_int 1) >>> (const_int 2) (const_int 3)]))))] >>> "TARGET_SSE4_1" >>> >>> >>> Here are results of looking for pmovzxwd in resulting binaries: >>> #objdump -d tcpjumbo-orig | grep pmovzxwd | grep >>> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 76 >>> #objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep >>> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 0 >>> #objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep >>> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >>> 76 >>> >>> Therefore I make a conclusion that Yr*x does not really differ much from x. >> >> Just FTR: >> >> Using "Yr,*x" is also a viable option: >> >> #objdump -d tcpjumbo-patch3 | grep pmovzxwd | grep >> "xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l >> 0 >> >> I believe that the above is the way to go with LRA. Vladimir, what do you >> think? >> >> Uros.