rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq
instructions (respectively), but it looks like there's no reason for
the register allocater to allocate registers together.  The peephole2
just picks up loads to adjacent memory locations if the allocater
happens to choose adjacent registers (is that correct?) or the
variables are specified as living in hard registers with the help
of an asm.

Several other architectures have paired loads: some ARM targets have ldrd
which can be cheaper than a ldm, and ia64 has a pair load.

It seems like GCC does a good job of knowing how to modify register-
sized subregs of two- or four-register larger modes.  So if I could
tell GCC to turn:

       [(set (reg:SI X) (mem:SI (addr)))
        (set (reg:SI Y) (mem:SI (addr+4)))]

(where addr is aligned to DI) into something like:
       [(set (reg:DI T) (mem:DI (addr)))
        (set (reg:SI X) (subreg:SI (reg:DI T) 0))
        (set (reg:SI Y) (subreg:SI (reg:DI T) 4))]

and I could do so early enough, GCC would know to access the subregs
directly in instruction(s) using the loaded values, and I would end up loading
the register pair and using the individual elements.  But it has to
be done early on; after register allocation even if I could get a
DI temporary I'd probably have the two SI moves and that's probably
not a win.

I've tinkered with splits but can't seem to get it to work.  And I'm
aware that trying to do it too early might be bad, because pseudo's
might not be alive in the future, or might be in memory.  But you
can't do it at peephole2, because the registers won't be paired.

Any ideas?  Am I going at it from the right angle?

--
Why are ``tolerant'' people so intolerant of intolerant people?

Reply via email to