On Mon, Nov 27, 2017 at 6:29 PM, Jeff Law <l...@redhat.com> wrote:
> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>>
>>> On 22 Nov 2017, at 17:33, Jeff Law <l...@redhat.com> wrote:
>>>
>>> On 11/22/2017 04:31 AM, Alan Hayward wrote:
>>>>
>>>>> On 21 Nov 2017, at 03:13, Jeff Law <l...@redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>>>>>> totally forgotten about it.  And in fact it seems to come pretty close
>>>>>>> to what you need…
>>>>>>
>>>>>> Yes, some of the code is similar to the way
>>>>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>>>>>> CLOBBER expr code served as a starting point for writing the patch. The 
>>>>>> main difference
>>>>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>>>>>> specific Instruction,
>>>>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
>>>>>> an expression (tls_desc).
>>>>>> It meant there wasn’t really any opportunity to resume any existing code.
>>>>> Understood.  Though your first patch mentions that you're trying to
>>>>> describe partial preservation "around TLS calls". Presumably those are
>>>>> represented as normal insns, not call_insn.
>>>>>
>>>>> That brings me back to Richi's idea of exposing a set of the low subreg
>>>>> to itself using whatever mode is wide enough to cover the neon part of
>>>>> the register.
>>>>>
>>>>> That should tell the generic parts of the compiler that you're just
>>>>> clobbering the upper part and at least in theory you can implement in
>>>>> the aarch64 backend and the rest of the compiler should "just work"
>>>>> because that's the existing semantics of a subreg store.
>>>>>
>>>>> The only worry would be if a pass tried to get overly smart and
>>>>> considered that kind of set a nop -- but I think I'd argue that's simply
>>>>> wrong given the semantics of a partial store.
>>>>>
>>>>
>>>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
>>>> It’s something we considered, and then dismissed.
>>>>
>>>> The problem then is you are now using SET semantics on those registers, 
>>>> and it
>>>> would make the register live around the function, which might not be the 
>>>> case.
>>>> Whereas clobber semantics will just make the register dead - which is 
>>>> exactly
>>>> what we want (but only conditionally).
>>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>>> low part is preserved while the upper part is clobbered across the TLS
>>> insns.
>>>
>>> jeff
>>
>> Consider where the TLS call is inside a loop. The compiler would normally 
>> want
>> to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
>> tls_desc we
>> are now making x live across the loop, x is dependant on the value from the 
>> previous
>> iteration, and the tls_desc can no longer be hoisted.
> Hmm.  I think I see the problem you're trying to point out.  Let me
> restate it and see if you agree.
>
> The low subreg set does clearly indicate the upper part of the SVE
> register is clobbered.  The problem is from a liveness standpoint the
> compiler is considering the low part live, even though it's a self-set.
>
> In fact, if that is the case, then a single TLS call (independent of a
> loop) would make the low part of the register globally live.  This
> should be testable.  Include one of these low part self sets on the
> existing TLS calls and compile a little test function and let's look at
> the liveness data.
>
>
> Now it could be the case that various local analysis could sub-optimally
> handle things.  You mention LICM.  I know our original LICM did have a
> problem in that if it saw a use of a hard reg in a loop without seeing a
> set of that hard reg it considered the register varying within the loop.
>  I have no idea if we carried that forward when the loop code was
> rewritten (when I looked at this it was circa 1992).
>
>
>>
>> Or consider a stream of code containing two tls_desc calls (ok, the compiler 
>> might
>> optimise one of the tls calls away, but this approach should be reusable for 
>> other exprs).
>> Between the two set(x,x)’s x is considered live so the register allocator 
>> can’t use that
>> register.
>> Given that we are applying this to all the neon registers, the register 
>> allocator now throws
>> an ICE because it can’t find any free hard neon registers to use.
> Given your statements it sounds like the liveness infrastructure is
> making those neon regs globally live when it sees the low part subreg
> self-set.  Let's confirm that one way or the other and see where it
> takes us.

Indeed in (set (subreg:neon reg1) (subreg:neon reg1)) it appears that
the lowpart of reg1
is used and thus it is live but liveness analysis can (and should)
simply ignore such sets.

> Jeff

Reply via email to