On 18/04/12 21:47, Richard Sandiford wrote:
I don't think the idea is that these cases are special in themselves.
What we're looking for are pseudos that _may_ be decomposed into
separate registers.  If one of the pseudos in the move is only used in
decomposable contexts (including nonvolatile loads and stores, as well
as copies to and from hard registers, etc.), then we may be able to
completely replace the original pseudo with two smaller ones.  E.g.:

     (set (reg:DI X) (mem:DI ...))
     ...
     (set (reg:DI Y) (reg:DI X))

In this case, X can be completely replaced by two SImode registers.

What isn't clear to me is why we don't seem to do the same for:

     (set (reg:DI X) (mem:DI ...))
     (set (mem:DI ...) (reg:DI X))

My reading would be: if the backend wanted to lower these then it would have expanded the load differently in the first place. The fact that the machine description says "reg:DI", rather than "subreg:SI (reg:DI" shows that the 64-bit load is the more-optimal method (at least in the view of the author).

(I have the same complaint about the lower-subreg zero-extend special-handling: if the backend wanted it lowered that way, why didn't it code it that way?)

Look at pr43137; the old code was sub-optimal and was replaced with a pre-lowered version. (Admittedly, it was the pseudo-to-pseudo bug that highlighted the problem.) Now, I want to indicate the opposite, when NEON is available: that preferred form for and extend is the non-lowered form, but this mysterious copy-rule has defeated me.

Perhaps we do and I'm just misreading the code.  Or perhaps it's just
too hard to get the costs right.  Splitting that would be moving even
further from what you want though :-)

No, I don't think you misunderstood it: in the example, X is used as "reg:DI" and that should prohibit lowering (even if there's a subreg use somewhere else).

Here's another example that worries me:

        (set (reg:DI X) (mem:DI ...))
        ...
        (set (reg:DI Y) (reg:DI X))
        ...
        (set (...) (...:SI (subreg:SI (reg:DI Y) 0)))
        (set (...) (...:SI (subreg:SI (reg:DI Y) 4)))

Without the copy, there would be no lowering in this example. With the copy (and let's remember that they're not added deliberately; they're just artefacts of the expand process) the Y register will be lowered, but the X not.

I worries me that these "optimizations" are relying on "undocumented"/"undefined"/unsomethinged behaviour (at best). In this example it probably won't make any difference to the register allocation, but in pr43137 is was this pattern and the combination of hard-regs and sign_extend being known to only modify one of the subregs caused the trouble.

The result of this extra feature is that if I copy the output of a
DImode insn *directly* to a DImode hard reg (say a return value) then
there's no decomposition, but if the expand pass happens to have put an
intermediate pseudo register (as it does do) then this extra rule
decomposes it most unhelpfully (ok, there's only actually a problem if
the compiler can reason that one subreg or the other is unchanged, as is
the case with sign_extend).

But remember that this pass is not designed for targets like NEON that
have lots of native double-word operations.  It's designed for targets
that don't.  I think you said earlier that your testcase was handled
correctly for non-NEON, which is the point: decomposing in that case
_may_ be a benefit (if we end up being able to replace all uses of a
doubleword pseudo with uses of 2 word pseudos) and should be no worse
otherwise.

True, but there ought to be a way to achieve the goal that doesn't penalize targets that have a few doubleword operations?

I'm coming to the conclusion that this (and the zero_extend/shift special-handling) is some attempt to paper over the short-comings of some backends (ok, probably all, somewhere) that don't expand to say what they mean, but instead rely on split, or have insns that output multiple instructions.

Lowering a register that is only accessed via (subreg:SI (reg:DI ..)) is, of course, a useful thing to do (it's what the machine description is *asking* for), and we should do that after both expand and split, but these extra "smarts" are deeply suspicious.

So, after having thought all this through again, unless somebody can
show why not, I propose that we remove this mis-feature entirely, or at
least disable it in the first pass.

I still prefer the idea of disabling in the first pass.  It'll need to
be tested on something like non-NEON ARM to see whether it makes things
worse or better there.  (I think size testing would be fine.)

I'll have a go, and see what happens.

Andrew

Reply via email to