Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote: >> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> >> writes: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>> data flow consistent, but it also increases register allocation pressure >>> and thus often creates many unwanted register-to-register moves that >>> cannot be optimized away. >> There are two things here: >> >> - If emit_move_complex_parts emits a clobber of a hard register, >> then that's probably a bug/misfeature. The point of the clobber is >> to indicate that the register has no useful contents. That's useful >> for wide pseudos that are written to in parts, since it avoids the >> need to track the liveness of each part of the pseudo individually. >> But it shouldn't be necessary for hard registers, since subregs of >> hard registers are simplified to hard registers wherever possible >> (which on most targets is "always"). >> >> So I think the emit_move_complex_parts clobber should be restricted >> to !HARD_REGISTER_P, like the lower-subreg clobber is. If that helps >> (if only partly) then it would be worth doing as its own patch. > Agreed. > >> >> - I think it'd be worth looking into more detail why a clobber makes >> a difference to register pressure. A clobber of a pseudo register R >> shouldn't make R conflict with things that are live at the point of >> the clobber. > Also agreed. > >> >>> It seems just analogous to partial register >>> stall which is a famous problem on processors that do register renaming. >>> >>> In my opinion, when the register to be clobbered is a composite of hard >>> ones, we should clobber the individual elements separetely, otherwise >>> clear the entire to zero prior to use as the "init-regs" pass does (like >>> partial register stall workarounds on x86 CPUs). Such redundant zero >>> constant assignments will be removed later in the "cprop_hardreg" pass. >> I don't think we should rely on the zero being optimised away later. >> >> Emitting the zero also makes it harder for the register allocator >> to elide the move. For example, if we have: >> >> (set (subreg:SI (reg:DI P) 0) (reg:SI R0)) >> (set (subreg:SI (reg:DI P) 4) (reg:SI R1)) >> >> then there is at least a chance that the RA could assign hard registers >> R0:R1 to P, which would turn the moves into nops. If we emit: >> >> (set (reg:DI P) (const_int 0)) >> >> beforehand then that becomes impossible, since R0 and R1 would then >> conflict with P. >> >> TBH I'm surprised we still run init_regs for LRA. I thought there was >> a plan to stop doing that, but perhaps I misremember. > I have vague memories of dealing with some of this nonsense a few > release cycles. I don't recall all the details, but init-regs + > lower-subreg + regcprop + splitting all conspired to generate poor code > on the MIPS targets. See pr87761, though it doesn't include all my > findings -- I can't recall if I walked through the entire tortured > sequence in the gcc-patches discussion or not. > > I ended up working around in the mips backend in conjunction with some > changes to regcprop IIRC.
Thanks for the pointer, hadn't seen that. And yeah, for the early-ish passes, I guess the interaction between lower-subreg and init-regs is important too, not just the interaction between lower-subreg and RA. It probably also ties into the problems with overly-scalarised register moves, like in PR 106106. So maybe I was being too optimistic :-) Richard