On Tue, Nov 11, 2014 at 01:10:01AM -0600, Segher Boessenkool wrote: > On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote: > > However, the double pattern is completely broken. This cannot go in. > > [snip] > > > It is unacceptable to have to do the inner loop doing a load, vector add, > > and > > store in the loop. > > Before the patch, the final reduction used *vsx_reduc_splus_v2df; after > the patch, it is *vsx_reduc_plus_v2df_scalar. The former does a vector > add, the latter a float add. And it uses the same pseudoregister for the > accumulator throughout. IRA decides a register is more expensive than > memory for this, I suppose because it wants both V2DF and DF? It doesn't > seem to like the subreg very much.
I haven't looked into in detail (I've been a little busy with th upper regs patch), but I suspect the problem is that 128-bit and 64-bit types cannot overlap (i.e. rs6000_cannot_change_mode_class returns true). This is due to the fact that scalars in VSX registers occupy the upper 64-bits, which would not match the compiler's notion of that it should be in the bottom 64-bits. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797