On Tue, Dec 13, 2016 at 04:29:36PM -0600, Segher Boessenkool wrote: > On Tue, Dec 13, 2016 at 10:15:02AM -0500, Michael Meissner wrote: > > This patch should address the comments in the last patch. > > > > I have tested this patch with bootstrap builds and make check regression > > tests > > on a little endian Power8 64-bit system and a big endian Power7 32/64-bit > > system with no regressions. Can I check this into the trunk? > > > + else if (mode == V8HImode) > > + { > > + rtx tmp_gpr_si = (GET_CODE (tmp_gpr) == SCRATCH > > + ? dest_si > > + : gen_rtx_REG (SImode, REGNO (tmp_gpr))); > > I think you have these the wrong way around?
The rs6000_split_vec_extract_var function is called from several places in vsx.md to do a variable vector extract. In looking at each of the cases, there is a GPR tmp register for each of the calls, so I could modify it to move: gcc_assert (REG_P (tmp_gpr)); before the support for VEXTU{B,H,W}{L,R}X instructions, and leave the gcc_assert (REG_P (tmp_altivec)); and remove the test for SCRATCH. In the original version of the code, the non-variable case also called rs6000_split_vec_extract_var, and it did not have a scratch register. > You didn't address the reload_completed on all the splitters yet; is there > a reason for it? Because there are 4 different cases that generate wildy different code, based on what register class or memory is used: 1) For DImode, DFmode, and SFmode we could be extracting to a vector register and we would not use VEXTU{B,H,W}{L,R}X but instead do a VLSO and other operations; 2) Even if the result is in a GPR, DImode and DFmode have to do VLSO because there is no VEXTUD{L,R}X instruction that deposits the value in a GPR. Similarly, on ISA 2.07, we don't have those instructions, so we have to generate the more complex instructions; 3) The variable extract code also handles variable extracts that are stores; 4) Finally there is the new case where we are extracting to a GPR when we have ISA 3.0 instructions. Basically, until we know all of the details (i.e. after register allocator), we can't do the split, because the code is different. There is also the practical case that due to the funky way the scalar parts are not in the bottom part of the register, that SUBREG's really don't work between 64-bit and 128-bit items that go in vector registers. After register allocation, we can do gen_rtx_REG (<mode>, <regno>) to change the types, but that really doesn't work before register allocator. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797