On 11/20/2017 08:04 AM, Alan Hayward wrote:
>>>
>>> Yes, that was an early alternative option for the patch.
>>>
>>> With that it would effect every operation that uses SVE registers. A simple
>>> add of two registers now has 4 inputs and two outputs. It would get in the
>>> way when debugging any sve dumps and be generally annoying.
>>> Possible that the code for that in would all be in the aarch64 target,
>>> (making everyone else happy!) But I suspect that there would be still be
>>> strange dependency issues that’d need sorting in the common code.
>>>
>>> Whereas with this patch, there are no new oddities in non-tls 
>>> compiles/dumps.
>>> Although the patch touches a lot of files, the changes are mostly restricted
>>> to places where standard clobbers were already being checked.
>> I'm not entirely sure that it would require doubling the number of
>> inputs/outputs.  It's not conceptually much different than how we
>> describe DImode operations on 32bit targets.  The mode selects one or
>> more consecutive registers, so you don't actually need anything weird in
>> your patterns.  This is pretty standard stuff.
> 
> Ok, fair enough.
> 
>>
>>
>> It would be painful in that the Neon regs would have to interleave with
>> the upper part of the SVE regs in terms of register numbers.  It would
>> also mean that you couldn't tie together multiple neon regs into
>> something wider.  I'm not sure if the latter would be an issue or not.
> 
> And there’s also the weirdness that the register would not be split evenly - 
> it’ll be a TI
> reg followed by a reg of the size of multiple TIs.
> 
> All of that has the potential to complicate all non-sve aarch64 code.
Agreed.  Let's drop this line of exploration.


> 
>>
>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>> totally forgotten about it.  And in fact it seems to come pretty close
>> to what you need…
> 
> Yes, some of the code is similar to the way
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
> CLOBBER expr code served as a starting point for writing the patch. The main 
> difference
> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
> specific Instruction,
> it’s part of the calling abi. Whereas clobber_high is explicitly tied to an 
> expression (tls_desc).
> It meant there wasn’t really any opportunity to resume any existing code.
Understood.  Though your first patch mentions that you're trying to
describe partial preservation "around TLS calls".  Presumably those are
represented as normal insns, not call_insn.

That brings me back to Richi's idea of exposing a set of the low subreg
to itself using whatever mode is wide enough to cover the neon part of
the register.

That should tell the generic parts of the compiler that you're just
clobbering the upper part and at least in theory you can implement in
the aarch64 backend and the rest of the compiler should "just work"
because that's the existing semantics of a subreg store.

The only worry would be if a pass tried to get overly smart and
considered that kind of set a nop -- but I think I'd argue that's simply
wrong given the semantics of a partial store.

jeff





Reply via email to