On the PowerPC starting with ISA 2.07 (power8), moving a single precision value
(SFmode) from a vector register to a GPR involves converting the scalar value
in the register from being in double (DFmode) format to the 32-bit
vector/storage format, doing the move to the GPR, and then doing a shift right
32-bits to get the value into the bottom 32-bits of the GPR for use as a
scalar:
xscvdpspn 0,1
mfvsrd 3,0
srdi 3,3,32
It turns out that the current processors starting with ISA 2.06 (power7)
through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the
XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to
the second 32-bit word. This allows us to eliminate the shift instruction,
since the value is already in the correct location for a 32-bit scalar.
ISA 3.0 is being updated to include this specification (and other fixes) so
that future processors will also be able to eliminate the shift.
The new code is:
xscvdpspn 0,1
mfvsrwz 3,0
While I was working on the modifications, I noticed that if the user did a
round from DFmode to SFmode and then tried to move it to a GPR, it would
originally do:
frsp 1,2
xscvdpspn 0,1
mfvsrd 3,0
srdi 3,3,32
The XSCVDPSP instruction already handles values outside of the SFmode range
(XSCVDPSPN does not), and so I added a combiner pattern to combine the two
instructions:
xscvdpsp 0,1
mfvsrwz 3,0
While I was looking at the code, I was noticing that if we have a SImode value
in a vector register, and we want to sign extended it and leave the value in a
GPR register, on power8 the register allocator would decide to do a 32-bit
store integer instruction and a sign extending load in the GPR to do the sign
extension. I added a splitter to convert this into a pair of MFVSRWZ and
EXTSH instructions.
I built Spec 2006 with the changes, and I noticed the following changes in the
code:
* Round DF->SF and move to GPR: namd, wrf;
* Eliminate 32-bit shift: gromacs, namd, povray, wrf;
* Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref.
I have built these changes on the following machines with bootstrap and no
regressions in the regression test:
* Big endian power7 (with both 32/64-bit targets);
* Little endian power8;
* Little endian power9 prototype.
Can I check these changes into GCC 8? Can I back port these changes into the
GCC 7 branch?
[gcc]
2017-09-19 Michael Meissner <[email protected]>
* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is
next to vsx_xscvspdp.
(vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow
SFmode values being in Altivec registers.
(vsx_xscvdpspn): Eliminate uneeded alternative. Use correct
constraint ('ws') for DFmode.
(vsx_xscvspdpn): Likewise.
(vsx_xscvdpspn_scalar): Likewise.
(peephole for optimizing move SF to GPR): Adjust code to eliminate
needing to do the shift right 32-bits operation after XSCVDPSPN.
* config/rs6000/rs6000.md (extendsi<mode>2): Add alternative to do
sign extend from vector register to GPR via a split, preventing
the register allocator from doing the move via store/load.
(extendsi<mode>2 splitter): Likewise.
(movsi_from_sf): Adjust code to eliminate doing a 32-bit shift
right or vector extract after doing XSCVDPSPN. Use MFVSRWZ
instead of MFVSRD to move the value to a GPR register.
(movdi_from_sf_zero_ext): Likewise.
(movsi_from_df): Add optimization to merge a convert from DFmode
to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead
of round and XSCVDPSPN.
(reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the
value to a GPR register. Rename p8_mfvsrd_4_disf insn to
p8_mfvsrwz_disf.
(p8_mfvsrd_4_disf): Likewise.
(p8_mfvsrwz_disf): Likewise.
[gcc/testsuite]
2017-09-19 Michael Meissner <[email protected]>
* gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to
reflect that we don't generate a 32-bit shift right after
XSCVDPSPN.
* gcc.target/powerpc/direct-move-float1.c: Likewise.
* gcc.target/powerpc/direct-move-float3.c: New test.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: [email protected], phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (revision 252844)
+++ gcc/config/rs6000/vsx.md (working copy)
@@ -1781,6 +1781,15 @@ (define_insn "vsx_xscvspdp"
"xscvspdp %x0,%x1"
[(set_attr "type" "fp")])
+;; Same as vsx_xscvspdp, but use SF as the type
+(define_insn "vsx_xscvspdp_scalar2"
+ [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
+ (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
+ UNSPEC_VSX_CVSPDP))]
+ "VECTOR_UNIT_VSX_P (V4SFmode)"
+ "xscvspdp %x0,%x1"
+ [(set_attr "type" "fp")])
+
;; Generate xvcvhpsp instruction
(define_insn "vsx_xvcvhpsp"
[(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
@@ -1794,41 +1803,32 @@ (define_insn "vsx_xvcvhpsp"
;; format of scalars is actually DF.
(define_insn "vsx_xscvdpsp_scalar"
[(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
- (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+ (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
UNSPEC_VSX_CVSPDP))]
"VECTOR_UNIT_VSX_P (V4SFmode)"
"xscvdpsp %x0,%x1"
[(set_attr "type" "fp")])
-;; Same as vsx_xscvspdp, but use SF as the type
-(define_insn "vsx_xscvspdp_scalar2"
- [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
- (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
- UNSPEC_VSX_CVSPDP))]
- "VECTOR_UNIT_VSX_P (V4SFmode)"
- "xscvspdp %x0,%x1"
- [(set_attr "type" "fp")])
-
;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
(define_insn "vsx_xscvdpspn"
- [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww")
- (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+ [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww")
+ (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "ws")]
UNSPEC_VSX_CVDPSPN))]
"TARGET_XSCVDPSPN"
"xscvdpspn %x0,%x1"
[(set_attr "type" "fp")])
(define_insn "vsx_xscvspdpn"
- [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?ws")
- (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wf,wa")]
+ [(set (match_operand:DF 0 "vsx_register_operand" "=ws")
+ (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
UNSPEC_VSX_CVSPDPN))]
"TARGET_XSCVSPDPN"
"xscvspdpn %x0,%x1"
[(set_attr "type" "fp")])
(define_insn "vsx_xscvdpspn_scalar"
- [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,?wa")
- (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww,ww")]
+ [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+ (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
UNSPEC_VSX_CVDPSPN))]
"TARGET_XSCVDPSPN"
"xscvdpspn %x0,%x1"
@@ -4773,15 +4773,13 @@ (define_constants
;;
;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX))
;;
-;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32)))
+;; (set (reg:DI reg4) (and:DI (reg:DI reg3) (reg:DI reg3)))
;;
-;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4)))
+;; (set (reg:DI reg5) (ashift:DI (reg:DI reg4) (const_int 32)))
;;
-;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32)))
+;; (set (reg:SF reg6) (unspec:SF [(reg:DI reg5)] UNSPEC_P8V_MTVSRD))
;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD))
-;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN))
+;; (set (reg:SF reg6) (unspec:SF [(reg:SF reg6)] UNSPEC_VSX_CVSPDPN))
(define_peephole2
[(match_scratch:DI SFBOOL_TMP_GPR "r")
@@ -4792,11 +4790,6 @@ (define_peephole2
(unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")]
UNSPEC_P8V_RELOAD_FROM_VSX))
- ;; SRDI
- (set (match_dup SFBOOL_MFVSR_D)
- (lshiftrt:DI (match_dup SFBOOL_MFVSR_D)
- (const_int 32)))
-
;; AND/IOR/XOR operation on int
(set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand")
(and_ior_xor:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand")
@@ -4820,15 +4813,15 @@ (define_peephole2
&& (REG_P (operands[SFBOOL_BOOL_A2])
|| CONST_INT_P (operands[SFBOOL_BOOL_A2]))
&& (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
- || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D]))
+ || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
&& (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
|| (REG_P (operands[SFBOOL_BOOL_A2])
&& REGNO (operands[SFBOOL_MFVSR_D])
== REGNO (operands[SFBOOL_BOOL_A2])))
&& REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
&& (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
- || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D]))
- && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])"
+ || peep2_reg_dead_p (3, operands[SFBOOL_BOOL_D]))
+ && peep2_reg_dead_p (4, operands[SFBOOL_SHL_D])"
[(set (match_dup SFBOOL_TMP_GPR)
(ashift:DI (match_dup SFBOOL_BOOL_A_DI)
(const_int 32)))
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md (revision 252844)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -986,8 +986,11 @@ (define_insn_and_split "*extendhi<mode>2
(define_insn "extendsi<mode>2"
- [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK,wH")
- (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
"Y,r,Z,Z,r,wK,wH")))]
+ [(set (match_operand:EXTSI 0 "gpc_reg_operand"
+ "=r, r, wl, wu, wj, wK, wH, wr")
+
+ (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
+ "Y, r, Z, Z, r, wK, wH, ?wIwH")))]
""
"@
lwa%U1%X1 %0,%1
@@ -996,10 +999,23 @@ (define_insn "extendsi<mode>2"
lxsiwax %x0,%y1
mtvsrwa %x0,%1
vextsw2d %0,%1
+ #
#"
- [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm")
+ [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr")
(set_attr "sign_extend" "yes")
- (set_attr "length" "4,4,4,4,4,4,8")])
+ (set_attr "length" "4,4,4,4,4,4,8,8")])
+
+(define_split
+ [(set (match_operand:DI 0 "int_reg_operand")
+ (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]
+ "TARGET_DIRECT_MOVE_64BIT && reload_completed"
+ [(set (match_dup 2)
+ (match_dup 1))
+ (set (match_dup 0)
+ (sign_extend:DI (match_dup 2)))]
+{
+ operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+})
(define_split
[(set (match_operand:DI 0 "altivec_register_operand")
@@ -6790,25 +6806,25 @@ (define_insn "*movsi_internal1_single"
;; needed.
;; MR LWZ LFIWZX LXSIWZX STW
-;; STFS STXSSP STXSSPX VSX->GPR MTVSRWZ
-;; VSX->VSX
+;; STFS STXSSP STXSSPX VSX->GPR VSX->VSX,
+;; MTVSRWZ
(define_insn_and_split "movsi_from_sf"
[(set (match_operand:SI 0 "nonimmediate_operand"
"=r, r, ?*wI, ?*wH, m,
- m, wY, Z, r, wIwH,
- ?wK")
+ m, wY, Z, r, ?*wIwH,
+ wIwH")
(unspec:SI [(match_operand:SF 1 "input_operand"
"r, m, Z, Z, r,
- f, wb, wu, wIwH, r,
- wK")]
+ f, wb, wu, wIwH, wIwH,
+ r")]
UNSPEC_SI_FROM_SF))
(clobber (match_scratch:V4SF 2
"=X, X, X, X, X,
X, X, X, wa, X,
- wa"))]
+ X"))]
"TARGET_NO_SF_SUBREG
&& (register_operand (operands[0], SImode)
@@ -6823,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf"
stxssp %1,%0
stxsspx %x1,%y0
#
- mtvsrwz %x0,%1
- #"
+ xscvdpspn %x0,%x1
+ mtvsrwz %x0,%1"
"&& reload_completed
- && register_operand (operands[0], SImode)
+ && int_reg_operand (operands[0], SImode)
&& vsx_reg_sfsubreg_ok (operands[1], SFmode)"
[(const_int 0)]
{
@@ -6836,50 +6852,38 @@ (define_insn_and_split "movsi_from_sf"
rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
- if (int_reg_operand (op0, SImode))
- {
- emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2));
- emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32)));
- }
- else
- {
- rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
- rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
- emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off));
- }
-
+ emit_insn (gen_p8_mfvsrwz_disf (op0_di, op2));
DONE;
}
[(set_attr "type"
"*, load, fpload, fpload, store,
- fpstore, fpstore, fpstore, mftgpr, mffgpr,
- veclogical")
+ fpstore, fpstore, fpstore, mftgpr, fp,
+ mffgpr")
(set_attr "length"
"4, 4, 4, 4, 4,
- 4, 4, 4, 12, 4,
- 8")])
+ 4, 4, 4, 8, 4,
+ 4")])
;; movsi_from_sf with zero extension
;;
;; RLDICL LWZ LFIWZX LXSIWZX VSX->GPR
-;; MTVSRWZ VSX->VSX
+;; VSX->VSX MTVSRWZ
(define_insn_and_split "*movdi_from_sf_zero_ext"
[(set (match_operand:DI 0 "gpc_reg_operand"
"=r, r, ?*wI, ?*wH, r,
- wIwH, ?wK")
+ wK, wIwH")
(zero_extend:DI
(unspec:SI [(match_operand:SF 1 "input_operand"
"r, m, Z, Z, wIwH,
- r, wK")]
+ wIwH, r")]
UNSPEC_SI_FROM_SF)))
(clobber (match_scratch:V4SF 2
"=X, X, X, X, wa,
- X, wa"))]
+ X, X"))]
"TARGET_DIRECT_MOVE_64BIT
&& (register_operand (operands[0], DImode)
@@ -6890,9 +6894,10 @@ (define_insn_and_split "*movdi_from_sf_z
lfiwzx %0,%y1
lxsiwzx %x0,%y1
#
- mtvsrwz %x0,%1
- #"
+ #
+ mtvsrwz %x0,%1"
"&& reload_completed
+ && register_operand (operands[0], DImode)
&& vsx_reg_sfsubreg_ok (operands[1], SFmode)"
[(const_int 0)]
{
@@ -6901,29 +6906,43 @@ (define_insn_and_split "*movdi_from_sf_z
rtx op2 = operands[2];
emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
if (int_reg_operand (op0, DImode))
- {
- emit_insn (gen_p8_mfvsrd_4_disf (op0, op2));
- emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32)));
- }
+ emit_insn (gen_p8_mfvsrwz_disf (op0, op2));
else
{
- rtx op0_si = gen_rtx_REG (SImode, REGNO (op0));
- rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
- rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
- emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off));
+ rtx op2_si = gen_rtx_REG (SImode, reg_or_subregno (op2));
+ emit_insn (gen_zero_extendsidi2 (op0, op2_si));
}
DONE;
}
[(set_attr "type"
"*, load, fpload, fpload, mftgpr,
- mffgpr, veclogical")
+ vecexts, mffgpr")
(set_attr "length"
- "4, 4, 4, 4, 12,
- 4, 8")])
+ "4, 4, 4, 4, 8,
+ 8, 4")])
+
+;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
+;; moving it to SImode. We can do a SFmode store without having to do the
+;; conversion explicitly. If we are doing a register->register conversion, use
+;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
+;; input will not fit in a SFmode, and the later assumes the value has already
+;; been rounded.
+(define_insn "*movsi_from_df"
+ [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z")
+ (unspec:SI [(float_truncate:SF
+ (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
+ UNSPEC_SI_FROM_SF))]
+
+ "TARGET_NO_SF_SUBREG"
+ "@
+ xscvdpsp %x0,%x1
+ stfs%U0%X0 %1,%0
+ stxssp %1,%0
+ stxsspx %x1,%y0"
+ [(set_attr "type" "fp,fpstore,fpstore,fpstore")])
;; Split a load of a large constant into the appropriate two-insn
;; sequence.
@@ -8437,19 +8456,20 @@ (define_insn_and_split "reload_gpr_from_
rtx diop0 = simplify_gen_subreg (DImode, op0, SFmode, 0);
emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
- emit_insn (gen_p8_mfvsrd_4_disf (diop0, op2));
- emit_insn (gen_lshrdi3 (diop0, diop0, GEN_INT (32)));
+ emit_insn (gen_p8_mfvsrwz_disf (diop0, op2));
DONE;
}
[(set_attr "length" "12")
(set_attr "type" "three")])
-(define_insn "p8_mfvsrd_4_disf"
+;; XSCVDPSPN puts the 32-bit value in both the first and second words, so we do
+;; not need to do a shift to extract the value.
+(define_insn "p8_mfvsrwz_disf"
[(set (match_operand:DI 0 "register_operand" "=r")
(unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")]
UNSPEC_P8V_RELOAD_FROM_VSX))]
"TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
- "mfvsrd %0,%x1"
+ "mfvsrwz %0,%x1"
[(set_attr "type" "mftgpr")])
Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr71977-1.c (revision 252844)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c (working copy)
@@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma
return u.value;
}
-/* { dg-final { scan-assembler "\[ \t\]xxland " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]and " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]stxv" } } */
-/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */
-/* { dg-final { scan-assembler-not "\[ \t\]srdi " } } */
+/* { dg-final { scan-assembler {\mxxland\M} } } */
+/* { dg-final { scan-assembler-not {\mand\M} } } */
+/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv\M} } } */
+/* { dg-final { scan-assembler-not {\mlxv\M} } } */
+/* { dg-final { scan-assembler-not {\msrdi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c (revision
252844)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c (working copy)
@@ -5,7 +5,7 @@
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } {
"-mcpu=power8" } } */
/* { dg-options "-mcpu=power8 -O2" } */
/* { dg-final { scan-assembler "mtvsrd" } } */
-/* { dg-final { scan-assembler "mfvsrd" } } */
+/* { dg-final { scan-assembler "mfvsrwz" } } */
/* { dg-final { scan-assembler "xscvdpspn" } } */
/* { dg-final { scan-assembler "xscvspdpn" } } */
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float3.c (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float3.c (revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+/* Test that we generate XSCVDPSP instead of FRSP and XSCVDPSPN when we combine
+ a round from double to float and moving the float value to a GPR. */
+
+union u {
+ float f;
+ unsigned int ui;
+ int si;
+};
+
+unsigned int
+ui_d (double d)
+{
+ union u x;
+ x.f = d;
+ return x.ui;
+}
+
+/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */
+/* { dg-final { scan-assembler {\mxscvdpsp\M} } } */
+/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler-not {\msrdi\M} } } */