rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq instructions (respectively), but it looks like there's no reason for the register allocater to allocate registers together. The peephole2 just picks up loads to adjacent memory locations if the allocater happens to choose adjacent registers (is that correct?) or the variables are specified as living in hard registers with the help of an asm.
Several other architectures have paired loads: some ARM targets have ldrd which can be cheaper than a ldm, and ia64 has a pair load. It seems like GCC does a good job of knowing how to modify register- sized subregs of two- or four-register larger modes. So if I could tell GCC to turn: [(set (reg:SI X) (mem:SI (addr))) (set (reg:SI Y) (mem:SI (addr+4)))] (where addr is aligned to DI) into something like: [(set (reg:DI T) (mem:DI (addr))) (set (reg:SI X) (subreg:SI (reg:DI T) 0)) (set (reg:SI Y) (subreg:SI (reg:DI T) 4))] and I could do so early enough, GCC would know to access the subregs directly in instruction(s) using the loaded values, and I would end up loading the register pair and using the individual elements. But it has to be done early on; after register allocation even if I could get a DI temporary I'd probably have the two SI moves and that's probably not a win. I've tinkered with splits but can't seem to get it to work. And I'm aware that trying to do it too early might be bad, because pseudo's might not be alive in the future, or might be in memory. But you can't do it at peephole2, because the registers won't be paired. Any ideas? Am I going at it from the right angle? -- Why are ``tolerant'' people so intolerant of intolerant people?