On Mon, Nov 27, 2017 at 6:29 PM, Jeff Law <l...@redhat.com> wrote: > On 11/23/2017 04:11 AM, Alan Hayward wrote: >> >>> On 22 Nov 2017, at 17:33, Jeff Law <l...@redhat.com> wrote: >>> >>> On 11/22/2017 04:31 AM, Alan Hayward wrote: >>>> >>>>> On 21 Nov 2017, at 03:13, Jeff Law <l...@redhat.com> wrote: >>>>>> >>>>>>> >>>>>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED. I'd >>>>>>> totally forgotten about it. And in fact it seems to come pretty close >>>>>>> to what you need… >>>>>> >>>>>> Yes, some of the code is similar to the way >>>>>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the >>>>>> CLOBBER expr code served as a starting point for writing the patch. The >>>>>> main difference >>>>>> here, is that _PART_CLOBBERED is around all calls and is not tied to a >>>>>> specific Instruction, >>>>>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to >>>>>> an expression (tls_desc). >>>>>> It meant there wasn’t really any opportunity to resume any existing code. >>>>> Understood. Though your first patch mentions that you're trying to >>>>> describe partial preservation "around TLS calls". Presumably those are >>>>> represented as normal insns, not call_insn. >>>>> >>>>> That brings me back to Richi's idea of exposing a set of the low subreg >>>>> to itself using whatever mode is wide enough to cover the neon part of >>>>> the register. >>>>> >>>>> That should tell the generic parts of the compiler that you're just >>>>> clobbering the upper part and at least in theory you can implement in >>>>> the aarch64 backend and the rest of the compiler should "just work" >>>>> because that's the existing semantics of a subreg store. >>>>> >>>>> The only worry would be if a pass tried to get overly smart and >>>>> considered that kind of set a nop -- but I think I'd argue that's simply >>>>> wrong given the semantics of a partial store. >>>>> >>>> >>>> So, the instead of using clobber_high(reg X), to use set(reg X, reg X). >>>> It’s something we considered, and then dismissed. >>>> >>>> The problem then is you are now using SET semantics on those registers, >>>> and it >>>> would make the register live around the function, which might not be the >>>> case. >>>> Whereas clobber semantics will just make the register dead - which is >>>> exactly >>>> what we want (but only conditionally). >>> ?!? A set of the subreg is the *exact* semantics you want. It says the >>> low part is preserved while the upper part is clobbered across the TLS >>> insns. >>> >>> jeff >> >> Consider where the TLS call is inside a loop. The compiler would normally >> want >> to hoist that out of the loop. By adding a set(x,x) into the parallel of the >> tls_desc we >> are now making x live across the loop, x is dependant on the value from the >> previous >> iteration, and the tls_desc can no longer be hoisted. > Hmm. I think I see the problem you're trying to point out. Let me > restate it and see if you agree. > > The low subreg set does clearly indicate the upper part of the SVE > register is clobbered. The problem is from a liveness standpoint the > compiler is considering the low part live, even though it's a self-set. > > In fact, if that is the case, then a single TLS call (independent of a > loop) would make the low part of the register globally live. This > should be testable. Include one of these low part self sets on the > existing TLS calls and compile a little test function and let's look at > the liveness data. > > > Now it could be the case that various local analysis could sub-optimally > handle things. You mention LICM. I know our original LICM did have a > problem in that if it saw a use of a hard reg in a loop without seeing a > set of that hard reg it considered the register varying within the loop. > I have no idea if we carried that forward when the loop code was > rewritten (when I looked at this it was circa 1992). > > >> >> Or consider a stream of code containing two tls_desc calls (ok, the compiler >> might >> optimise one of the tls calls away, but this approach should be reusable for >> other exprs). >> Between the two set(x,x)’s x is considered live so the register allocator >> can’t use that >> register. >> Given that we are applying this to all the neon registers, the register >> allocator now throws >> an ICE because it can’t find any free hard neon registers to use. > Given your statements it sounds like the liveness infrastructure is > making those neon regs globally live when it sees the low part subreg > self-set. Let's confirm that one way or the other and see where it > takes us.
Indeed in (set (subreg:neon reg1) (subreg:neon reg1)) it appears that the lowpart of reg1 is used and thus it is live but liveness analysis can (and should) simply ignore such sets. > Jeff