This is both a fix to a regression (since GCC 4.9), a code improvement for GLIBC, and fixes potential bugs with the recent changes to allow small integers (32-bit integer, SImode in particular) in floating point and vector registers.
The core of the problem is when a SFmode (32-bit binary floating point) value is a scalar in the floating point and vector registers, it is stored internally as a 64-bit binary floating point value. This means that if you could look at the value using the SUBREG mechanism you might see the wrong value. Before the recent changes to add small integer support went in, it was less of an issue, since the only integer type allowed in floating point and vector registers was 64-bit integers (DImode). The regression comes in in how SFmode values are moved between general purpose and floating point/vector registers. Up through the power7, the way you moved SFmode values from one register set to another was store and then load. Doing back to back store and load to the same location could cause serious performance problems on recent power systems. On the power7 and power8, we would put a special NOP that forces the two instructions to be in different dispatch groups, which helps somewhat. When the power8 (ISA 2.07) came out, it had direct move instructions and convert between scalar double precision and single precision. I added the appropriate secondary_reload support so that if the register allocator wanted to move a SFmode value between register banks, it would create a temporary and do the appropriate instructions to move the value. This worked in the GCC 4.8 time frame. Some time in the 4.9 time frame, this broke and the register allocator would more often generate store and load instead of the direct move sequences. However, simple test cases continued to use the direct move instructions. In checking on power9 (ISA 3.0) code, it more likely uses the store/load than direct move. On power8 spec runs we have seen the effect of these store/load sequences on some benchmarks from the next generation of the Spec suite. The optimization that the GLIBC implementers have requested (PR 71977) was to speed up code sequences that they use in implementing the single precision math library. They often times need to extract/modify bits in floating point values (for example, setting exponents or mantissas, etc.). For example from e_powf.c, you see code like this after macro expansion: typedef union { float value; u_int32_t word; } ieee_float_shape_type; float t1; int32_t is; /* ... */ do { ieee_float_shape_type gf_u; gf_u.value = (t1); (is) = gf_u.word; } while (0); do { ieee_float_shape_type sf_u; sf_u.word = (is&0xfffff000); (t1) = sf_u.value; } while (0); Originally, I just wrote a peephole2 to catch the above code, and it worked in small test cases on the power8. But it didn't work on larger programs or on the power9. I also wanted to fix the performance issue that we've seen. I also became convinced that for GCC 7, it was a ticking time bomb where eventually somebody would write code that intermixed SImode and SFmode, and it would get the wrong value. The main part of the patch is to not let the compiler generate: (set (reg:SI) (subreg:SF (reg:SI))) or (set (reg:SI) (subreg:SI (reg:SF))) Most of the register predicates eventually call gpc_reg_operand, and it was simple to put the check in there, and to other predicates that did not call gpc_reg_operand. I created new insns to do the move between formats that allocated the temporary needed with match_scratch. There were places that then needed to not have the check (the movsi/movsf expanders themselves, and the insn spliters for format conversion insns), and I added a predicate for that. I have built the patches on a little endian power8, a big endian power8 (64-bit only), and a big endian power7 (both 32-bit and 64-bit). There were no regression failures. In addition, I built spec 2006, with the fixes and without, and did a quick run comparing the results (1 run). I am re-running the spec results, with the code merged to today's trunk, and with 3 runs to isolate the occasional benchmark that goes off in the weeds. Of the 29 benchmarks in Spec 2006 CPU, 6 benchmarks had changes in the instructions generated (perlbench, gromacs, cactusADM, namd, povray, wrf). In the single run I did, there were no regressions, and 2 or 3 benchmarks improved: namd 6% tonto 3% libquantum 6% However, single runs of libquantum have varied as much as 10%, so without seeing more runs, I will skip it. Namd was one of the benchmarks that saw changes in code generation, but tonto did not have changes of code. I suspect having the separate converter unspec insn, allowed the scheduler to move things in between the move and the use. So, in summary, can I check these changes into the GCC 7 trunk? Given it does fix a long standing regression in GCC 6 that hurts performance, did you want me to develop the changes for a GCC 6 backport as well? I realize it is skating on thin ice whether it is a feature or a fix. [gcc] 2016-12-29 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/71977 PR target/70568 PR target/78823 * config/rs6000/predicates.md (sf_subreg_operand): New predicate to return true if the operand contains a SUBREG mixing SImode and SFmode on 64-bit VSX systems with direct move. This can be a problem in that we have to know whether the SFmode value should be represented in the 32-bit memory format or the 64-bit scalar format used within the floating point and vector registers. (altivec_register_operand): Do not return true if the operand contains a SUBREG mixing SImode and SFmode. (vsx_register_operand): Likewise. (vsx_reg_sfsubreg_ok): New predicate. Like vsx_register_operand, but do not check if there is a SUBREG mixing SImode and SFmode on 64-bit VSX systems with direct move. (vfloat_operand): Do not return true if the operand contains a SUBREG mixing SImode and SFmode. (vint_operand): Likewise. (vlogical_operand): Likewise. (gpc_reg_operand): Likewise. (int_reg_operand): Likewise. * config/rs6000/rs6000.c (valid_sf_si_move): New function to determine if a MOVSI or MOVSF operation contains SUBREGs that mix SImode and SFmode. (rs6000_emit_move): If we have a MOVSI or MOVSF operation that contains SUBREGs that mix SImode and SFmode, call special insns, that can allocate the necessary temporary registers to convert between SFmode and SImode within the registers. * config/rs6000/vsx.md (SFBOOL_*): Add peephole2 to recognize when we are converting a SFmode to a SImode, moving the result to a GPR register, doing a single AND/IOR/XOR operation, and then moving it back to a vector register. Change the insns recognized to move the integer value to the vector register and do the operation there. This code occurs quite a bit in the GLIBC math library in float math functions. (peephole2 to speed up GLIB math functions): Likewise. * config/rs6000/rs6000-protos.h (valid_sf_si_move): Add declaration. * config/rs6000/rs6000.h (TARGET_NO_SF_SUBREG): New internal target macros to say whether we need to avoid SUBREGs mixing SImode and SFmode. (TARGET_ALLOW_SF_SUBREG): Likewise. * config/rs6000/rs6000.md (UNSPEC_SF_FROM_SI): Ne unspecs. (UNSPEC_SI_FROM_SF): Likewise. (iorxor): Change spacing. (and_ior_xor): New iterator for AND, IOR, and XOR. (movsi_from_sf): New insn to handle where we are moving SFmode values to SImode registers and we need to convert the value to the memory format from the format used within the register. (movdi_from_sf_zero_ext): Optimize zero extending movsi_from_sf. (mov<mode>_hardfloat, FMOVE32 iterator): Don't allow moving SUBREGs mixing SImode and SFmode on 64-bit VSX systems with direct move. (movsf_from_si): New insn to handle where we are moving SImode values to SFmode registers and we need to convert the value to the the format used within the floating point and vector registers to the 32-bit memory format. (fma<mode>4): Change register_operand to gpc_reg_operand to prevent SUBREGs mixing SImode and SFmode. (fms<mode>4): Likewise. (fnma<mode>4): Likewise. (fnms<mode>4): Likewise. (nfma<mode>4): Likewise. (nfms<mode>4): Likewise. [gcc/testsuite] 2016-12-29 Michael Meissner <meiss...@linux.vnet.ibm.com> PR target/71977 PR target/70568 PR target/78823 * gcc.target/powerpc/pr71977-1.c: New tests to check whether on 64-bit VSX systems with direct move, whether we optimize common code sequences in the GLIBC math library for float math functions. * gcc.target/powerpc/pr71977-2.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy) @@ -31,12 +31,47 @@ (define_predicate "count_register_operan (match_test "REGNO (op) == CTR_REGNO || REGNO (op) > LAST_VIRTUAL_REGISTER"))) +;; Return 1 if op is a SUBREG that is used to look at a SFmode value as +;; and integer or vice versa. +;; +;; In the normal case where SFmode is in a floating point/vector register, it +;; is stored as a DFmode and has a different format. If we don't transform the +;; value, things that use logical operations on the values will get the wrong +;; value. +;; +;; If we don't have 64-bit and direct move, this conversion will be done by +;; store and load, instead of by fiddling with the bits within the register. +(define_predicate "sf_subreg_operand" + (match_code "subreg") +{ + rtx inner_reg = SUBREG_REG (op); + machine_mode inner_mode = GET_MODE (inner_reg); + + if (TARGET_ALLOW_SF_SUBREG || !REG_P (inner_reg)) + return 0; + + if ((mode == SFmode && GET_MODE_CLASS (inner_mode) == MODE_INT) + || (GET_MODE_CLASS (mode) == MODE_INT && inner_mode == SFmode)) + { + if (INT_REGNO_P (REGNO (inner_reg))) + return 0; + + return 1; + } + return 0; +}) + ;; Return 1 if op is an Altivec register. (define_predicate "altivec_register_operand" (match_operand 0 "register_operand") { if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } if (!REG_P (op)) return 0; @@ -52,6 +87,27 @@ (define_predicate "vsx_register_operand" (match_operand 0 "register_operand") { if (GET_CODE (op) == SUBREG) + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } + + if (!REG_P (op)) + return 0; + + if (REGNO (op) >= FIRST_PSEUDO_REGISTER) + return 1; + + return VSX_REGNO_P (REGNO (op)); +}) + +;; Like vsx_register_operand, but allow SF SUBREGS +(define_predicate "vsx_reg_sfsubreg_ok" + (match_operand 0 "register_operand") +{ + if (GET_CODE (op) == SUBREG) op = SUBREG_REG (op); if (!REG_P (op)) @@ -69,7 +125,12 @@ (define_predicate "vfloat_operand" (match_operand 0 "register_operand") { if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } if (!REG_P (op)) return 0; @@ -86,7 +147,12 @@ (define_predicate "vint_operand" (match_operand 0 "register_operand") { if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } if (!REG_P (op)) return 0; @@ -103,7 +169,13 @@ (define_predicate "vlogical_operand" (match_operand 0 "register_operand") { if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } + if (!REG_P (op)) return 0; @@ -221,6 +293,9 @@ (define_predicate "const_0_to_15_operand (match_test "IN_RANGE (INTVAL (op), 0, 15)"))) ;; Return 1 if op is a register that is not special. +;; Disallow (SUBREG:SF (REG:SI)) and (SUBREG:SI (REG:SF)) on VSX systems where +;; you need to be careful in moving a SFmode to SImode and vice versa due to +;; the fact that SFmode is represented as DFmode in the VSX registers. (define_predicate "gpc_reg_operand" (match_operand 0 "register_operand") { @@ -228,7 +303,12 @@ (define_predicate "gpc_reg_operand" return 0; if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } if (!REG_P (op)) return 0; @@ -246,7 +326,8 @@ (define_predicate "gpc_reg_operand" }) ;; Return 1 if op is a general purpose register. Unlike gpc_reg_operand, don't -;; allow floating point or vector registers. +;; allow floating point or vector registers. Since vector registers are not +;; allowed, we don't have to reject SFmode/SImode subregs. (define_predicate "int_reg_operand" (match_operand 0 "register_operand") { @@ -254,7 +335,12 @@ (define_predicate "int_reg_operand" return 0; if (GET_CODE (op) == SUBREG) - op = SUBREG_REG (op); + { + if (TARGET_NO_SF_SUBREG && sf_subreg_operand (op, mode)) + return 0; + + op = SUBREG_REG (op); + } if (!REG_P (op)) return 0; @@ -266,6 +352,8 @@ (define_predicate "int_reg_operand" }) ;; Like int_reg_operand, but don't return true for pseudo registers +;; We don't have to check for SF SUBREGS because pseudo registers +;; are not allowed, and SF SUBREGs are ok within GPR registers. (define_predicate "int_reg_operand_not_pseudo" (match_operand 0 "register_operand") { Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -10341,6 +10341,39 @@ rs6000_emit_le_vsx_move (rtx dest, rtx s } } +/* Return whether a SFmode or SImode move can be done without converting one + mode to another. This arrises when we have: + + (SUBREG:SF (REG:SI ...)) + (SUBREG:SI (REG:SF ...)) + + and one of the values is in a floating point/vector register, where SFmode + scalars are stored in DFmode format. */ + +bool +valid_sf_si_move (rtx dest, rtx src, machine_mode mode) +{ + if (TARGET_ALLOW_SF_SUBREG) + return true; + + if (mode != SFmode && GET_MODE_CLASS (mode) != MODE_INT) + return true; + + if (!SUBREG_P (src) || !sf_subreg_operand (src, mode)) + return true; + + /*. Allow (set (SUBREG:SI (REG:SF)) (SUBREG:SI (REG:SF))). */ + if (SUBREG_P (dest)) + { + rtx dest_subreg = SUBREG_REG (dest); + rtx src_subreg = SUBREG_REG (src); + return GET_MODE (dest_subreg) == GET_MODE (src_subreg); + } + + return false; +} + + /* Emit a move from SOURCE to DEST in mode MODE. */ void rs6000_emit_move (rtx dest, rtx source, machine_mode mode) @@ -10371,6 +10404,39 @@ rs6000_emit_move (rtx dest, rtx source, gcc_unreachable (); } + /* If we are running before register allocation on a 64-bit machine with + direct move, and we see either: + + (set (reg:SF xxx) (subreg:SF (reg:SI yyy) zzz)) (or) + (set (reg:SI xxx) (subreg:SI (reg:SF yyy) zzz)) + + convert these into a form using UNSPEC. This is due to SFmode being + stored within a vector register in the same format as DFmode. We need to + convert the bits before we can use a direct move or operate on the bits in + the vector register as an integer type. + + Skip things like (set (SUBREG:SI (...) (SUBREG:SI (...)). */ + if (TARGET_DIRECT_MOVE_64BIT && !reload_in_progress && !reload_completed + && !lra_in_progress + && (!SUBREG_P (dest) || !sf_subreg_operand (dest, mode)) + && SUBREG_P (source) && sf_subreg_operand (source, mode)) + { + rtx inner_source = SUBREG_REG (source); + machine_mode inner_mode = GET_MODE (inner_source); + + if (mode == SImode && inner_mode == SFmode) + { + emit_insn (gen_movsi_from_sf (dest, inner_source)); + return; + } + + if (mode == SFmode && inner_mode == SImode) + { + emit_insn (gen_movsf_from_si (dest, inner_source)); + return; + } + } + /* Check if GCC is setting up a block move that will end up using FP registers as temporaries. We must make sure this is acceptable. */ if (GET_CODE (operands[0]) == MEM Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -3897,3 +3897,124 @@ (define_insn "*vinsert4b_di_internal" "TARGET_P9_VECTOR" "xxinsertw %x0,%x1,%3" [(set_attr "type" "vecperm")]) + + + +;; Attempt to optimize some common operations using logical operations to pick +;; apart SFmode operations. This is tricky, because we can't just do the +;; operation using VSX logical operations directly, because in the PowerPC, +;; SFmode is represented internally as DFmode in the vector registers. + +;; The insns for dealing with SFmode in GPR registers looks like: +;; (set (reg:V4SF reg2) (unspec:V4SF [(reg:SF reg1)] UNSPEC_VSX_CVDPSPN)) +;; +;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX)) +;; +;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32))) +;; +;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4))) +;; +;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32))) +;; +;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD)) +;; +;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN)) + +(define_code_iterator sf_logical [and ior xor]) + +(define_constants + [(SFBOOL_TMP_GPR 0) ;; GPR temporary + (SFBOOL_TMP_VSX 1) ;; vector temporary + (SFBOOL_MFVSR_D 2) ;; move to gpr dest + (SFBOOL_MFVSR_A 3) ;; move to gpr src + (SFBOOL_BOOL_D 4) ;; and/ior/xor dest + (SFBOOL_BOOL_A1 5) ;; and/ior/xor arg1 + (SFBOOL_BOOL_A2 6) ;; and/ior/xor arg1 + (SFBOOL_SHL_D 7) ;; shift left dest + (SFBOOL_SHL_A 8) ;; shift left arg + (SFBOOL_MTVSR_D 9) ;; move to vecter dest + (SFBOOL_BOOL_A_DI 10) ;; SFBOOL_BOOL_A1/A2 as DImode + (SFBOOL_TMP_VSX_DI 11) ;; SFBOOL_TMP_VSX as DImode + (SFBOOL_MTVSR_D_V4SF 12)]) ;; SFBOOL_MTVSRD_D as V4SFmode + +(define_peephole2 + [(match_scratch:DI SFBOOL_TMP_GPR "r") + (match_scratch:V4SF SFBOOL_TMP_VSX "wa") + + ;; MFVSRD + (set (match_operand:DI SFBOOL_MFVSR_D "int_reg_operand") + (unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")] + UNSPEC_P8V_RELOAD_FROM_VSX)) + + ;; SRDI + (set (match_dup SFBOOL_MFVSR_D) + (lshiftrt:DI (match_dup SFBOOL_MFVSR_D) + (const_int 32))) + + ;; AND/IOR/XOR operation on int + (set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand") + (sf_logical:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand") + (match_operand:SI SFBOOL_BOOL_A2 "reg_or_cint_operand"))) + + ;; SLDI + (set (match_operand:DI SFBOOL_SHL_D "int_reg_operand") + (ashift:DI (match_operand:DI SFBOOL_SHL_A "int_reg_operand") + (const_int 32))) + + ;; MTVSRD + (set (match_operand:SF SFBOOL_MTVSR_D "vsx_register_operand") + (unspec:SF [(match_dup SFBOOL_SHL_D)] UNSPEC_P8V_MTVSRD))] + + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE + /* The REG_P (xxx) tests prevents SUBREG's, which allows us to use REGNO + to compare registers, when the mode is different. */ + && REG_P (operands[SFBOOL_MFVSR_D]) && REG_P (operands[SFBOOL_BOOL_D]) + && REG_P (operands[SFBOOL_BOOL_A1]) && REG_P (operands[SFBOOL_SHL_D]) + && REG_P (operands[SFBOOL_SHL_A]) && REG_P (operands[SFBOOL_MTVSR_D]) + && (REG_P (operands[SFBOOL_BOOL_A2]) + || CONST_INT_P (operands[SFBOOL_BOOL_A2])) + && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D]) + || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D])) + && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1]) + || (REG_P (operands[SFBOOL_BOOL_A2]) + && REGNO (operands[SFBOOL_MFVSR_D]) + == REGNO (operands[SFBOOL_BOOL_A2]))) + && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A]) + && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D]) + || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D])) + && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])" + [(set (match_dup SFBOOL_TMP_GPR) + (ashift:DI (match_dup SFBOOL_BOOL_A_DI) + (const_int 32))) + + (set (match_dup SFBOOL_TMP_VSX_DI) + (match_dup SFBOOL_TMP_GPR)) + + (set (match_dup SFBOOL_MTVSR_D_V4SF) + (sf_logical:V4SF (match_dup SFBOOL_MFVSR_A) + (match_dup SFBOOL_TMP_VSX)))] +{ + rtx bool_a1 = operands[SFBOOL_BOOL_A1]; + rtx bool_a2 = operands[SFBOOL_BOOL_A2]; + int regno_mfvsr_d = REGNO (operands[SFBOOL_MFVSR_D]); + int regno_tmp_vsx = REGNO (operands[SFBOOL_TMP_VSX]); + int regno_mtvsr_d = REGNO (operands[SFBOOL_MTVSR_D]); + + if (CONST_INT_P (bool_a2)) + { + rtx tmp_gpr = operands[SFBOOL_TMP_GPR]; + emit_move_insn (tmp_gpr, bool_a2); + operands[SFBOOL_BOOL_A_DI] = tmp_gpr; + } + else + { + int regno_bool_a1 = REGNO (bool_a1); + int regno_bool_a2 = REGNO (bool_a2); + int regno_bool_a = (regno_mfvsr_d == regno_bool_a1 + ? regno_bool_a2 : regno_bool_a1); + operands[SFBOOL_BOOL_A_DI] = gen_rtx_REG (DImode, regno_bool_a); + } + + operands[SFBOOL_TMP_VSX_DI] = gen_rtx_REG (DImode, regno_tmp_vsx); + operands[SFBOOL_MTVSR_D_V4SF] = gen_rtx_REG (V4SFmode, regno_mtvsr_d); +}) Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000) (working copy) @@ -153,6 +153,7 @@ extern void rs6000_fatal_bad_address (rt extern rtx create_TOC_reference (rtx, rtx); extern void rs6000_split_multireg_move (rtx, rtx); extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode); +extern bool valid_sf_si_move (rtx, rtx, machine_mode); extern void rs6000_emit_move (rtx, rtx, machine_mode); extern rtx rs6000_secondary_memory_needed_rtx (machine_mode); extern machine_mode rs6000_secondary_memory_needed_mode (machine_mode); Index: gcc/config/rs6000/rs6000.h =================================================================== --- gcc/config/rs6000/rs6000.h (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/rs6000.h (.../gcc/config/rs6000) (working copy) @@ -608,6 +608,12 @@ extern int rs6000_vector_align[]; && TARGET_POWERPC64) #define TARGET_VEXTRACTUB (TARGET_P9_VECTOR && TARGET_DIRECT_MOVE \ && TARGET_UPPER_REGS_DI && TARGET_POWERPC64) + + +/* Whether we should avoid (SUBREG:SI (REG:SF) and (SUBREG:SF (REG:SI). */ +#define TARGET_NO_SF_SUBREG TARGET_DIRECT_MOVE_64BIT +#define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT) + /* This wants to be set for p8 and newer. On p7, overlapping unaligned loads are slow. */ #define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 243966) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -150,6 +150,8 @@ (define_c_enum "unspec" UNSPEC_IEEE128_CONVERT UNSPEC_SIGNBIT UNSPEC_DOLOOP + UNSPEC_SF_FROM_SI + UNSPEC_SI_FROM_SF ]) ;; @@ -564,7 +566,8 @@ (define_code_attr return_pred [(return " (define_code_attr return_str [(return "") (simple_return "simple_")]) ; Logical operators. -(define_code_iterator iorxor [ior xor]) +(define_code_iterator iorxor [ior xor]) +(define_code_iterator and_ior_xor [and ior xor]) ; Signed/unsigned variants of ops. (define_code_iterator any_extend [sign_extend zero_extend]) @@ -6754,6 +6757,157 @@ (define_insn "*movsi_internal1_single" [(set_attr "type" "*,*,load,store,*,*,*,mfjmpr,mtjmpr,*,*,fpstore,fpload") (set_attr "length" "4,4,4,4,4,4,8,4,4,4,4,4,4")]) +;; Like movsi, but adjust a SF value to be used in a SI context, i.e. +;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0)) +;; +;; Because SF values are actually stored as DF values within the vector +;; registers, we need to convert the value to the vector SF format when +;; we need to use the bits in a union or similar cases. We only need +;; to do this transformation when the value is a vector register. Loads, +;; stores, and transfers within GPRs are assumed to be safe. +;; +;; This is a more general case of reload_gpr_from_vsxsf. That insn must have +;; no alternatives, because the call is created as part of secondary_reload, +;; and operand #2's register class is used to allocate the temporary register. +;; This function is called before reload, and it creates the temporary as +;; needed. + +;; MR LWZ LFIWZX LXSIWZX STW +;; STFS STXSSP STXSSPX VSX->GPR MTVSRWZ +;; VSX->VSX + +(define_insn_and_split "movsi_from_sf" + [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" + "=r, r, ?*wI, ?*wH, m, + m, wY, Z, r, wIwH, + ?wK") + + (unspec:SI [(match_operand:SF 1 "input_operand" + "r, m, Z, Z, r, + f, wu, wu, wIwH, r, + wK")] + UNSPEC_SI_FROM_SF)) + + (clobber (match_scratch:V4SF 2 + "=X, X, X, X, X, + X, X, X, wa, X, + wa"))] + + "TARGET_NO_SF_SUBREG + && (register_operand (operands[0], SImode) + || register_operand (operands[1], SFmode))" + "@ + mr %0,%1 + lwz%U1%X1 %0,%1 + lfiwzx %0,%y1 + lxsiwzx %x0,%y1 + stw%U0%X0 %1,%0 + stfs%U0%X0 %1,%0 + stxssp %1,%0 + stxsspx %x1,%y0 + # + mtvsrwz %x0,%1 + #" + "&& reload_completed + && register_operand (operands[0], SImode) + && vsx_reg_sfsubreg_ok (operands[1], SFmode)" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); + + emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1)); + + if (int_reg_operand (op0, SImode)) + { + emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2)); + emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32))); + } + else + { + rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1)); + rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12); + emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off)); + } + + DONE; +} + [(set_attr "type" + "*, load, fpload, fpload, store, + fpstore, fpstore, fpstore, mftgpr, mffgpr, + veclogical") + + (set_attr "length" + "4, 4, 4, 4, 4, + 4, 4, 4, 12, 4, + 8")]) + +;; movsi_from_sf with zero extension +;; +;; RLDICL LWZ LFIWZX LXSIWZX VSX->GPR +;; MTVSRWZ VSX->VSX + +(define_insn_and_split "*movdi_from_sf_zero_ext" + [(set (match_operand:DI 0 "gpc_reg_operand" + "=r, r, ?*wI, ?*wH, r, + wIwH, ?wK") + + (zero_extend:DI + (unspec:SI [(match_operand:SF 1 "input_operand" + "r, m, Z, Z, wIwH, + r, wK")] + UNSPEC_SI_FROM_SF))) + + (clobber (match_scratch:V4SF 2 + "=X, X, X, X, wa, + X, wa"))] + + "TARGET_DIRECT_MOVE_64BIT + && (register_operand (operands[0], DImode) + || register_operand (operands[1], SImode))" + "@ + rldicl %0,%1,0,32 + lwz%U1%X1 %0,%1 + lfiwzx %0,%y1 + lxsiwzx %x0,%y1 + # + mtvsrwz %x0,%1 + #" + "&& reload_completed + && vsx_reg_sfsubreg_ok (operands[1], SFmode)" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + + emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1)); + + if (int_reg_operand (op0, DImode)) + { + emit_insn (gen_p8_mfvsrd_4_disf (op0, op2)); + emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32))); + } + else + { + rtx op0_si = gen_rtx_REG (SImode, REGNO (op0)); + rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1)); + rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12); + emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off)); + } + + DONE; +} + [(set_attr "type" + "*, load, fpload, fpload, mftgpr, + mffgpr, veclogical") + + (set_attr "length" + "4, 4, 4, 4, 12, + 4, 8")]) + ;; Split a load of a large constant into the appropriate two-insn ;; sequence. @@ -6963,9 +7117,11 @@ (define_insn "mov<mode>_hardfloat" "m, <f32_lm>, <f32_lm2>, Z, r, <f32_sr>, <f32_sr2>, <f32_av>, <zero_fp>, <zero_fp>, r, <f32_dm>, f, <f32_vsx>, r, r, *h, 0"))] - "(gpc_reg_operand (operands[0], <MODE>mode) - || gpc_reg_operand (operands[1], <MODE>mode)) - && (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)" + "(register_operand (operands[0], <MODE>mode) + || register_operand (operands[1], <MODE>mode)) + && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT + && (TARGET_ALLOW_SF_SUBREG + || valid_sf_si_move (operands[0], operands[1], <MODE>mode))" "@ lwz%U1%X1 %0,%1 <f32_li> @@ -7007,6 +7163,75 @@ (define_insn "*mov<mode>_softfloat" [(set_attr "type" "*,mtjmpr,mfjmpr,load,store,*,*,*,*,*") (set_attr "length" "4,4,4,4,4,4,4,4,8,4")]) +;; Like movsf, but adjust a SI value to be used in a SF context, i.e. +;; (set (reg:SF ...) (subreg:SF (reg:SI ...) 0)) +;; +;; Because SF values are actually stored as DF values within the vector +;; registers, we need to convert the value to the vector SF format when +;; we need to use the bits in a union or similar cases. We only need +;; to do this transformation when the value is a vector register. Loads, +;; stores, and transfers within GPRs are assumed to be safe. +;; +;; This is a more general case of reload_vsx_from_gprsf. That insn must have +;; no alternatives, because the call is created as part of secondary_reload, +;; and operand #2's register class is used to allocate the temporary register. +;; This function is called before reload, and it creates the temporary as +;; needed. + +;; LWZ LFS LXSSP LXSSPX STW STFIWX +;; STXSIWX GPR->VSX VSX->GPR GPR->GPR +(define_insn_and_split "movsf_from_si" + [(set (match_operand:SF 0 "rs6000_nonimmediate_operand" + "=!r, f, wb, wu, m, Z, + Z, wy, ?r, !r") + + (unspec:SF [(match_operand:SI 1 "input_operand" + "m, m, wY, Z, r, f, + wu, r, wy, r")] + UNSPEC_SF_FROM_SI)) + + (clobber (match_scratch:DI 2 + "=X, X, X, X, X, X, + X, r, X, X"))] + + "TARGET_NO_SF_SUBREG + && (register_operand (operands[0], SFmode) + || register_operand (operands[1], SImode))" + "@ + lwz%U1%X1 %0,%1 + lfs%U1%X1 %0,%1 + lxssp %0,%1 + lxsspx %x0,%y1 + stw%U0%X0 %1,%0 + stfiwx %1,%y0 + stxsiwx %x1,%y0 + # + mfvsrwz %0,%x1 + mr %0,%1" + + "&& reload_completed + && vsx_reg_sfsubreg_ok (operands[0], SFmode) + && int_reg_operand_not_pseudo (operands[1], SImode)" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op1_di = gen_rtx_REG (DImode, REGNO (op1)); + + /* Move SF value to upper 32-bits for xscvspdpn. */ + emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32))); + emit_insn (gen_p8_mtvsrd_sf (op0, op2)); + emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); + DONE; +} + [(set_attr "length" + "4, 4, 4, 4, 4, 4, + 4, 12, 4, 4") + (set_attr "type" + "load, fpload, fpload, fpload, store, fpstore, + fpstore, vecfloat, mffgpr, *")]) + ;; Move 64-bit binary/decimal floating point (define_expand "mov<mode>" @@ -13217,11 +13442,11 @@ (define_insn "bpermd_<mode>" ;; Note that the conditions for expansion are in the FMA_F iterator. (define_expand "fma<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (match_operand:FMA_F 3 "register_operand" "")))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (match_operand:FMA_F 3 "gpc_reg_operand" "")))] "" "") @@ -13241,11 +13466,11 @@ (define_insn "*fma<mode>4_fpr" ; Altivec only has fma and nfms. (define_expand "fms<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (neg:FMA_F (match_operand:FMA_F 3 "register_operand" ""))))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" ""))))] "!VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "") @@ -13265,34 +13490,34 @@ (define_insn "*fms<mode>4_fpr" ;; If signed zeros are ignored, -(a * b - c) = -a * b + c. (define_expand "fnma<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (neg:FMA_F (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (neg:FMA_F (match_operand:FMA_F 3 "register_operand" "")))))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" "")))))] "!HONOR_SIGNED_ZEROS (<MODE>mode)" "") ;; If signed zeros are ignored, -(a * b + c) = -a * b - c. (define_expand "fnms<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (neg:FMA_F (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (match_operand:FMA_F 3 "register_operand" ""))))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (match_operand:FMA_F 3 "gpc_reg_operand" ""))))] "!HONOR_SIGNED_ZEROS (<MODE>mode) && !VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "") ; Not an official optab name, but used from builtins. (define_expand "nfma<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (neg:FMA_F (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (match_operand:FMA_F 3 "register_operand" ""))))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (match_operand:FMA_F 3 "gpc_reg_operand" ""))))] "!VECTOR_UNIT_ALTIVEC_P (<MODE>mode)" "") @@ -13313,12 +13538,12 @@ (define_insn "*nfma<mode>4_fpr" ; Not an official optab name, but used from builtins. (define_expand "nfms<mode>4" - [(set (match_operand:FMA_F 0 "register_operand" "") + [(set (match_operand:FMA_F 0 "gpc_reg_operand" "") (neg:FMA_F (fma:FMA_F - (match_operand:FMA_F 1 "register_operand" "") - (match_operand:FMA_F 2 "register_operand" "") - (neg:FMA_F (match_operand:FMA_F 3 "register_operand" "")))))] + (match_operand:FMA_F 1 "gpc_reg_operand" "") + (match_operand:FMA_F 2 "gpc_reg_operand" "") + (neg:FMA_F (match_operand:FMA_F 3 "gpc_reg_operand" "")))))] "" "") Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr71977-1.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c (.../gcc/testsuite/gcc.target/powerpc) (revision 243966) @@ -0,0 +1,31 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2" } */ + +#include <stdint.h> + +typedef union +{ + float value; + uint32_t word; +} ieee_float_shape_type; + +float +mask_and_float_var (float f, uint32_t mask) +{ + ieee_float_shape_type u; + + u.value = f; + u.word &= mask; + + return u.value; +} + +/* { dg-final { scan-assembler "\[ \t\]xxland " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]and " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stxv" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]srdi " } } */ Index: gcc/testsuite/gcc.target/powerpc/pr71977-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr71977-2.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr71977-2.c (.../gcc/testsuite/gcc.target/powerpc) (revision 243966) @@ -0,0 +1,31 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2" } */ + +#include <stdint.h> + +typedef union +{ + float value; + uint32_t word; +} ieee_float_shape_type; + +float +mask_and_float_sign (float f) +{ + ieee_float_shape_type u; + + u.value = f; + u.word &= 0x80000000; + + return u.value; +} + +/* { dg-final { scan-assembler "\[ \t\]xxland " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]and " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */ +/* { dg-final { scan-assembler-not "\[ \t\]stxv" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */ +/* { dg-final { scan-assembler-not "\[ \t\]srdi " } } */