> On 17 Nov 2017, at 19:31, Jeff Law <l...@redhat.com> wrote: > > On 11/16/2017 11:50 AM, Alan Hayward wrote: >> >>> On 16 Nov 2017, at 18:24, Richard Biener <richard.guent...@gmail.com> wrote: >>> >>> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law <l...@redhat.com> wrote: >>>> On 11/16/2017 05:34 AM, Alan Hayward wrote: >>>>> This is a set of patches aimed at supporting aarch64 SVE register >>>>> preservation around TLS calls. >>>>> >>>>> Across a TLS call, Aarch64 SVE does not explicitly preserve the >>>>> SVE vector registers. However, the Neon vector registers are >>>> preserved. >>>>> Due to overlapping of registers, this means the lower 128bits of all >>>>> SVE vector registers will be preserved. >>>>> >>>>> The existing GCC code will currently incorrectly assume preservation >>>>> of all of the SVE registers. >>>>> >>>>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit >>>> like >>>>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single >>>> register. >>>>> The mode of the expression indicates the size of the lower bits which >>>>> will be preserved. If the register contains a value bigger than this >>>>> mode then the code will treat the register as clobbered. >>>>> >>>>> The means in order to evaluate if a clobber high is relevant, we need >>>> to ensure >>>>> the mode of the existing value in a register is tracked. >>>>> >>>>> The following patches in this series add support for the >>>> CLOBBER_HIGH, >>>>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for >>>>> aarch64. The testing performed on these patches is also detailed in >>>> the >>>>> final patch. >>>>> >>>>> These patches are based on top of the linaro-dev/sve branch. >>>>> >>>>> A simpler alternative to this patch would be to assume all Neon and >>>> SVE >>>>> registers are clobbered across TLS calls, however this would be a >>>>> performance regression against all Aarch64 targets. >>>> So just a couple design questions. >>>> >>>> Presumably there's no reasonable way to set up GCC's view of the >>>> register file to avoid this problem? ISTM that if the SVE register was >>>> split into two, one for the part that overlapped with the neon register >>>> and one that did not, then this could be handled via standard >>>> mechanisms? >>>> >> >> Yes, that was an early alternative option for the patch. >> >> With that it would effect every operation that uses SVE registers. A simple >> add of two registers now has 4 inputs and two outputs. It would get in the >> way when debugging any sve dumps and be generally annoying. >> Possible that the code for that in would all be in the aarch64 target, >> (making everyone else happy!) But I suspect that there would be still be >> strange dependency issues that’d need sorting in the common code. >> >> Whereas with this patch, there are no new oddities in non-tls compiles/dumps. >> Although the patch touches a lot of files, the changes are mostly restricted >> to places where standard clobbers were already being checked. > I'm not entirely sure that it would require doubling the number of > inputs/outputs. It's not conceptually much different than how we > describe DImode operations on 32bit targets. The mode selects one or > more consecutive registers, so you don't actually need anything weird in > your patterns. This is pretty standard stuff.
Ok, fair enough. > > > It would be painful in that the Neon regs would have to interleave with > the upper part of the SVE regs in terms of register numbers. It would > also mean that you couldn't tie together multiple neon regs into > something wider. I'm not sure if the latter would be an issue or not. And there’s also the weirdness that the register would not be split evenly - it’ll be a TI reg followed by a reg of the size of multiple TIs. All of that has the potential to complicate all non-sve aarch64 code. > > You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED. I'd > totally forgotten about it. And in fact it seems to come pretty close > to what you need… Yes, some of the code is similar to the way TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the CLOBBER expr code served as a starting point for writing the patch. The main difference here, is that _PART_CLOBBERED is around all calls and is not tied to a specific Instruction, it’s part of the calling abi. Whereas clobber_high is explicitly tied to an expression (tls_desc). It meant there wasn’t really any opportunity to resume any existing code. Alan.