Re: Why does lower-subreg mark copied pseudos as "decomposable"?

Andrew Stubbs Thu, 19 Apr 2012 09:37:30 -0700

On 18/04/12 21:47, Richard Sandiford wrote:

I don't think the idea is that these cases are special in themselves.
What we're looking for are pseudos that _may_ be decomposed into
separate registers.  If one of the pseudos in the move is only used in
decomposable contexts (including nonvolatile loads and stores, as well
as copies to and from hard registers, etc.), then we may be able to
completely replace the original pseudo with two smaller ones.  E.g.:


     (set (reg:DI X) (mem:DI ...))
     ...
     (set (reg:DI Y) (reg:DI X))

In this case, X can be completely replaced by two SImode registers.

What isn't clear to me is why we don't seem to do the same for:

     (set (reg:DI X) (mem:DI ...))
     (set (mem:DI ...) (reg:DI X))

My reading would be: if the backend wanted to lower these then it wouldhave expanded the load differently in the first place. The fact that themachine description says "reg:DI", rather than "subreg:SI (reg:DI" showsthat the 64-bit load is the more-optimal method (at least in the view ofthe author).

(I have the same complaint about the lower-subreg zero-extendspecial-handling: if the backend wanted it lowered that way, why didn'tit code it that way?)

Look at pr43137; the old code was sub-optimal and was replaced with apre-lowered version. (Admittedly, it was the pseudo-to-pseudo bug thathighlighted the problem.) Now, I want to indicate the opposite, whenNEON is available: that preferred form for and extend is the non-loweredform, but this mysterious copy-rule has defeated me.

Perhaps we do and I'm just misreading the code.  Or perhaps it's just
too hard to get the costs right.  Splitting that would be moving even
further from what you want though :-)

No, I don't think you misunderstood it: in the example, X is used as"reg:DI" and that should prohibit lowering (even if there's a subreg usesomewhere else).


Here's another example that worries me:

        (set (reg:DI X) (mem:DI ...))
        ...
        (set (reg:DI Y) (reg:DI X))
        ...
        (set (...) (...:SI (subreg:SI (reg:DI Y) 0)))
        (set (...) (...:SI (subreg:SI (reg:DI Y) 4)))

Without the copy, there would be no lowering in this example. With thecopy (and let's remember that they're not added deliberately; they'rejust artefacts of the expand process) the Y register will be lowered,but the X not.

I worries me that these "optimizations" are relying on"undocumented"/"undefined"/unsomethinged behaviour (at best). In thisexample it probably won't make any difference to the registerallocation, but in pr43137 is was this pattern and the combination ofhard-regs and sign_extend being known to only modify one of the subregscaused the trouble.

The result of this extra feature is that if I copy the output of a
DImode insn *directly* to a DImode hard reg (say a return value) then
there's no decomposition, but if the expand pass happens to have put an
intermediate pseudo register (as it does do) then this extra rule
decomposes it most unhelpfully (ok, there's only actually a problem if
the compiler can reason that one subreg or the other is unchanged, as is
the case with sign_extend).


But remember that this pass is not designed for targets like NEON that
have lots of native double-word operations.  It's designed for targets
that don't.  I think you said earlier that your testcase was handled
correctly for non-NEON, which is the point: decomposing in that case
_may_ be a benefit (if we end up being able to replace all uses of a
doubleword pseudo with uses of 2 word pseudos) and should be no worse
otherwise.

True, but there ought to be a way to achieve the goal that doesn'tpenalize targets that have a few doubleword operations?

I'm coming to the conclusion that this (and the zero_extend/shiftspecial-handling) is some attempt to paper over the short-comings ofsome backends (ok, probably all, somewhere) that don't expand to saywhat they mean, but instead rely on split, or have insns that outputmultiple instructions.

Lowering a register that is only accessed via (subreg:SI (reg:DI ..))is, of course, a useful thing to do (it's what the machine descriptionis *asking* for), and we should do that after both expand and split, butthese extra "smarts" are deeply suspicious.

So, after having thought all this through again, unless somebody can
show why not, I propose that we remove this mis-feature entirely, or at
least disable it in the first pass.


I still prefer the idea of disabling in the first pass.  It'll need to
be tested on something like non-NEON ARM to see whether it makes things
worse or better there.  (I think size testing would be fine.)


I'll have a go, and see what happens.

Andrew

Re: Why does lower-subreg mark copied pseudos as "decomposable"?

Reply via email to