On Wed, Nov 12, 2014 at 03:26:35AM -0600, Segher Boessenkool wrote: > On Tue, Nov 11, 2014 at 08:27:22PM -0500, Michael Meissner wrote: > > > Before the patch, the final reduction used *vsx_reduc_splus_v2df; after > > > the patch, it is *vsx_reduc_plus_v2df_scalar. The former does a vector > > > add, the latter a float add. And it uses the same pseudoregister for the > > > accumulator throughout. IRA decides a register is more expensive than > > > memory for this, I suppose because it wants both V2DF and DF? It doesn't > > > seem to like the subreg very much. > > > > I haven't looked into in detail (I've been a little busy with th upper regs > > patch), but I suspect the problem is that 128-bit and 64-bit types cannot > > overlap (i.e. rs6000_cannot_change_mode_class returns true). This is due to > > the fact that scalars in VSX registers occupy the upper 64-bits, which would > > not match the compiler's notion of that it should be in the bottom 64-bits. > > You suspect correctly. Hacking around that in cannot_change_mode_class > doesn't help, subreg_get_info disallows it next. > > Changing the pattern so it does two extracts instead of an extract and > a subreg works (you get an fmr for the high part though, register alloc > doesn't know dest=src is for free here). > > _Should_ the subreg thing work? Or should the patterns be fixed?
As I said, we cannot allow CANNOT_CHANGE_MODE_CLASS to return false for this case, because the hardware just does not agree with what GCC believes is the natural placement for smaller values inside of larger register fields. I suspect even if you add new target support macros to fix it, it will be a game of whack-a-mole to find all of the places where there are hidden asumptions in the compiler about subreg ordering. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797