I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in the 2010 time frame. At the time, we did not allow SImode to go into floating point and vector registers. As part of the power9 work, we now allow SImode to go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or higher. But we never went back and tweaked the LFIWAX/LFIWZX support.
I was writing code for a possible future PowerPC machine, and the new code added an attribute that caused some of the -mno-vsx tests to fail. This was due to the floatsi<mode>2_lfiwax and floatunssi<mode>2_lfiwzx patterns did not have a non-VSX alternative, and the attribute processing needed to process the alternatives before the first split pass. In looking at the code, I decided to also clean up the underlying lfiwax and lfiwzx patterns. In this code, on machines that support SImode in floating point and vector registers, after register allocation we split the conversion to SFmode and DFmode into a sign/zero extend operation. On machines that do not support SImode in floating point and vector registers, we continue to use the lfiwax and lfiwzx unspec patterns. I have tested this code by doing bootstraps and make checks on both little endian and big endian systems, and there are no regressions. I did build (but not run) the following versions of Spec 2006 and every version built. little endian power8 64-bit little endian power9 64-bit big endian power7 64-bit big endian power8 64-bit big endian power9 64-bit bit endian power7 32-bit bit endian power8 32-bit bit endian power9 32-bit In general, the 32-bit code seems to generate a lot less instructions, including fewer lfiwax/lfiwzx instructions. On power8/power9 32-bit code, there was more mtvsrwz mtvsrwa instructions. The 64-bit code is more similar, but I notice that we aren't generating as many mtvsrd instructions, but instead generating mtvsrwa and mtvsrwd instructions. Can I check this into the trunk? [gcc] 2019-06-14 Michael Meissner <meiss...@linux.ibm.com> PR target/90822 * config/rs6000/rs6000.md (lfiwax): Update comment. Add vupkhsw/xxspltd or vupklsw/xxspltd split support to sign extend SImode on ISA 2.07 systems. (floatsi<mode>2_lfiwax): Rewrite. Do not split the insn until after register allocation so that we should always get lfiwax generated instead of doing a gpr load and direct move. On 64-bit systems that allow SImode in vector registers, split to do a sign_extend instead of lfiwax. (floatsi<mode>2_lfiwax_mem): Delete, no longer used. (lfiwzx): Move so this is next to lfiwax. (floatunssi<mode>2_lfiwax): Rewrite. Do not split the insn until after register allocation so that we should always get lfiwzx generated instead of doing a gpr load and direct move. On 64-bit systems that allow SImode in vector registers, split to do a zero_extend instead of lfiwzx. (floatunssi<mode>2_lfiwzx_mem): Delete, no longer used. [gcc/testsuite] 2019-06-14 Michael Meissner <meiss...@linux.ibm.com> PR target/90822 * gcc.target/powerpc/pr81348.c: Use -O2 instead of -Og. Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 272166) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -5231,88 +5231,46 @@ (define_insn "*xxsel<mode>" ;; Conversions to and from floating-point. -; We don't define lfiwax/lfiwzx with the normal definition, because we -; don't want to support putting SImode in FPR registers. -(define_insn "lfiwax" - [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v") - (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v")] +; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX because +; the sign/zero extend insns are not defined. +(define_insn_and_split "lfiwax" + [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v") + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")] UNSPEC_LFIWAX))] "TARGET_HARD_FLOAT && TARGET_LFIWAX" "@ lfiwax %0,%y1 lxsiwax %x0,%y1 mtvsrwa %x0,%1 - vextsw2d %0,%1" - [(set_attr "type" "fpload,fpload,mffgpr,vecexts") - (set_attr "isa" "*,p8v,p8v,p9v")]) - -; This split must be run before register allocation because it allocates the -; memory slot that is needed to move values to/from the FPR. We don't allocate -; it earlier to allow for the combiner to merge insns together where it might -; not be needed and also in case the insns are deleted as dead code. - -(define_insn_and_split "floatsi<mode>2_lfiwax" - [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>") - (float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r"))) - (clobber (match_scratch:DI 2 "=wa"))] - "TARGET_HARD_FLOAT && TARGET_LFIWAX - && <SI_CONVERT_FP> && can_create_pseudo_p ()" - "#" - "" - [(pc)] + vextsw2d %0,%1 + #" + "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR + && altivec_register_operand (operands[1], SImode)" + [(const_int 0)] { rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp; + int dest_regno = REGNO (dest); + int src_regno = REGNO (src); + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); + rtx src_v4si = gen_rtx_REG (V4SImode, src_regno); - if (!MEM_P (src) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE) - tmp = convert_to_mode (DImode, src, false); - else + if (BYTES_BIG_ENDIAN) { - tmp = operands[2]; - if (GET_CODE (tmp) == SCRATCH) - tmp = gen_reg_rtx (DImode); - if (MEM_P (src)) - { - src = rs6000_force_indexed_or_indirect_mem (src); - emit_insn (gen_lfiwax (tmp, src)); - } - else - { - rtx stack = rs6000_allocate_stack_temp (SImode, false, true); - emit_move_insn (stack, src); - emit_insn (gen_lfiwax (tmp, stack)); - } + emit_insn (gen_altivec_vupkhsw (dest_v2di, src_v4si)); + emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const1_rtx)); + DONE; } - emit_insn (gen_floatdi<mode>2 (dest, tmp)); - DONE; -} - [(set_attr "length" "12") - (set_attr "type" "fpload")]) - -(define_insn_and_split "floatsi<mode>2_lfiwax_mem" - [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>") - (float:SFDF - (sign_extend:DI - (match_operand:SI 1 "indexed_or_indirect_operand" "Z")))) - (clobber (match_scratch:DI 2 "=wa"))] - "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>" - "#" - "" - [(pc)] -{ - operands[1] = rs6000_force_indexed_or_indirect_mem (operands[1]); - if (GET_CODE (operands[2]) == SCRATCH) - operands[2] = gen_reg_rtx (DImode); - if (TARGET_P8_VECTOR) - emit_insn (gen_extendsidi2 (operands[2], operands[1])); else - emit_insn (gen_lfiwax (operands[2], operands[1])); - emit_insn (gen_floatdi<mode>2 (operands[0], operands[2])); - DONE; + { + emit_insn (gen_altivec_vupklsw (dest_v2di, src_v4si)); + emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const0_rtx)); + DONE; + } } - [(set_attr "length" "8") - (set_attr "type" "fpload")]) + [(set_attr "type" "fpload,fpload,mffgpr,vecexts,vecexts") + (set_attr "isa" "*,p8v,p8v,p9v,p8v") + (set_attr "length" "*,*,*,*,8")]) (define_insn "lfiwzx" [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,wa") @@ -5327,67 +5285,61 @@ (define_insn "lfiwzx" [(set_attr "type" "fpload,fpload,mftgpr,vecexts") (set_attr "isa" "*,p8v,p8v,p9v")]) -(define_insn_and_split "floatunssi<mode>2_lfiwzx" - [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>") - (unsigned_float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r"))) - (clobber (match_scratch:DI 2 "=wa"))] - "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>" +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode through +;; register allocation so that the register allocator generates a LFIWAX or +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* instruction +;; on power8 and LWA + STD + LFD on power7/power6 systems. + +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT +;; The first alternative is to support -mno-vsx and -mcpu=power6. +(define_insn_and_split "floatsi<mode>2_lfiwax" + [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa,wa") + (float:SFDF + (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,v,v"))) + (clobber (match_scratch:DI 2 "=d,d,v,wa,v,v"))] + "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>" "#" - "" - [(pc)] + "&& reload_completed" + [(match_dup 3) + (set (match_dup 0) + (float:SFDF (match_dup 2)))] { - rtx dest = operands[0]; rtx src = operands[1]; - rtx tmp; + rtx tmp = operands[2]; - if (!MEM_P (src) && TARGET_POWERPC64 && TARGET_DIRECT_MOVE) - tmp = convert_to_mode (DImode, src, true); - else - { - tmp = operands[2]; - if (GET_CODE (tmp) == SCRATCH) - tmp = gen_reg_rtx (DImode); - if (MEM_P (src)) - { - src = rs6000_force_indexed_or_indirect_mem (src); - emit_insn (gen_lfiwzx (tmp, src)); - } - else - { - rtx stack = rs6000_allocate_stack_temp (SImode, false, true); - emit_move_insn (stack, src); - emit_insn (gen_lfiwzx (tmp, stack)); - } - } - emit_insn (gen_floatdi<mode>2 (dest, tmp)); - DONE; + operands[3] = (TARGET_DIRECT_MOVE_64BIT + ? gen_extendsidi2 (tmp, src) + : gen_lfiwax (tmp, src)); } - [(set_attr "length" "12") - (set_attr "type" "fpload")]) + [(set_attr "length" "8,8,8,8,8,12") + (set_attr "type" "fpload,fpload,fpload,mffgpr,fp,fp") + (set_attr "isa" "*,p7v,p8v,p8v,p9v,p8v")]) -(define_insn_and_split "floatunssi<mode>2_lfiwzx_mem" - [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>") +;; LFIWZX LXSIWZX MTVSRWZ XXEXTRACTUW +;; The first alternative is to support -mno-vsx. +(define_insn_and_split "floatunssi<mode>2_lfiwzx" + [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa") (unsigned_float:SFDF - (zero_extend:DI - (match_operand:SI 1 "indexed_or_indirect_operand" "Z")))) - (clobber (match_scratch:DI 2 "=wa"))] + (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,wa"))) + (clobber (match_scratch:DI 2 "=d,d,v,wa,wa"))] "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>" "#" - "" - [(pc)] + "&& reload_completed" + [(match_dup 3) + (set (match_dup 0) + (float:SFDF (match_dup 2)))] { - operands[1] = rs6000_force_indexed_or_indirect_mem (operands[1]); - if (GET_CODE (operands[2]) == SCRATCH) - operands[2] = gen_reg_rtx (DImode); - if (TARGET_P8_VECTOR) - emit_insn (gen_zero_extendsidi2 (operands[2], operands[1])); - else - emit_insn (gen_lfiwzx (operands[2], operands[1])); - emit_insn (gen_floatdi<mode>2 (operands[0], operands[2])); - DONE; + rtx src = operands[1]; + rtx tmp = operands[2]; + + operands[3] = (TARGET_DIRECT_MOVE_64BIT + ? gen_zero_extendsidi2 (tmp, src) + : gen_lfiwzx (tmp, src)); + } [(set_attr "length" "8") - (set_attr "type" "fpload")]) + (set_attr "type" "fpload,fpload,fpload,mffgpr,vecexts") + (set_attr "isa" "*,p7v,p8v,p8v,p9v")]) ; For each of these conversions, there is a define_expand, a define_insn ; with a '#' template, and a define_split (with C code). The idea is Index: gcc/testsuite/gcc.target/powerpc/pr81348.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr81348.c (revision 272165) +++ gcc/testsuite/gcc.target/powerpc/pr81348.c (working copy) @@ -1,9 +1,11 @@ /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ /* { dg-require-effective-target powerpc_p9vector_ok } */ -/* { dg-options "-mdejagnu-cpu=power9 -Og" } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ /* PR target/81348: Compiler died in doing short->float conversion due to using - the wrong register in a define_split. */ + the wrong register in a define_split. Originially it failed with -Og. + Changes due to PR 90822 meant that -Og does not generate the lxsihzx and + vextsh2d instructions, but -O2 does. */ int a; short b; -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797