On Wed, Mar 02, 2022 at 03:54:29PM -0500, Michael Meissner wrote: > Optimize signed DImode -> TImode on power10.
> On power10, GCC tries to optimize the signed conversion from DImode to > TImode by using the vextsd2q instruction. However to generate this > instruction, it would have to generate 3 direct moves (1 from the GPR > registers to the altivec registers, and 2 from the altivec registers to > the GPR register). > > This patch generates the shift right immediate instruction to do the > conversion if the target/source registers ares GPR registers like it does > on earlier systems. If the target/source registers are Altivec registers, > it will generate the vextsd2q instruction. > PR target/104698 > * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete. > (extendditi2): Convert from define_expand to > define_insn_and_split. Replace with code to deal with both GPR > registers and with altivec registers. > > gcc/testsuite/ > PR target/104698 > * gcc.target/powerpc/pr104698-1.c: New test. > * gcc.target/powerpc/pr104698-2.c: New test. > +;; Sign extend DI to TI. We provide both GPR targets and Altivec targets on > +;; power10. On earlier systems, the machine independent code will generate a > +;; shift left to sign extend the 64-bit value to 128-bit. > +;; > +;; If the register allocator prefers to use GPR registers, we will use a > shift > +;; left instruction to sign extend the 64-bit value to 128-bit. > +;; > +;; If the register allocator prefers to use Altivec registers on power10, > +;; generate the vextsd2q instruction. > +(define_insn_and_split "extendditi2" > + [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v") > + (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z"))) > + (clobber (reg:DI CA_REGNO))] > + "TARGET_POWERPC64 && TARGET_POWER10" What happens with -m32 -m{no,}-powerpc64? > + "#" > + "&& reload_completed" > + [(pc)] > +{ > + rtx dest = operands[0]; > + rtx src = operands[1]; > + int dest_regno = reg_or_subregno (dest); > + > + /* Handle conversion to GPR registers. Load up the low part and then do > + a sign extension to the upper part. */ > + if (INT_REGNO_P (dest_regno)) > + { > + rtx dest_hi = gen_highpart (DImode, dest); > + rtx dest_lo = gen_lowpart (DImode, dest); > + > + emit_move_insn (dest_lo, src); > + emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63))); Please use src instead of dest_lo. This always works, because you did the low-part move first. > + DONE; > + } > + > + /* For conversion to an Altivec register, generate either a splat operation > + or a load rightmost double word instruction. Both instructions gets the > + DImode value into the lower 64 bits, and then do the vextsd2q > + instruction. */ > + (trailing whitespace) > + else if (ALTIVEC_REGNO_P (dest_regno)) > + { > + if (MEM_P (src)) > + emit_insn (gen_vsx_lxvrdx (dest, src)); > + else > + { > + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); > + emit_insn (gen_vsx_splat_v2di (dest_v2di, src)); > + } > + > + emit_insn (gen_extendditi2_vector (dest, dest)); > + DONE; > + } This patch needs testing on BE (and 32-bit as well of course). You did not get rid of the unspecs this way. Oh well, that can happen later. > + > + else > + gcc_unreachable (); > +} > + [(set_attr "length" "8")]) This needs to set the insn type for each alternative (they have different costs, for memory at least!) > +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ > +/* { dg-final { scan-assembler-not {\mmfvsrld\M} } } */ > +/* { dg-final { scan-assembler-not {\mmtvsrdd\M} } } */ You could just do > +/* { dg-final { scan-assembler-not {\mm[ft]vsr} } } */ and even all future variants would be handled :-) Looks good, just those last details :-) Segher