In doing work on improving power8 fusion support, I noticed that in several of the patterns (vector fused multiply-add, optimization of float (fix (x)), and vector reduction), I used the "ws" constraint which is the constraint for scalar double precision floating point (currently FLOAT_REGS) in cases where the operand is a vector, where we should use "wd" (preferred constraint for V2DF), "wf" (preferred constraint for V4SF) or even "wa" (any VSX register). This means the register allocator might generate extra code due to preferring the traditional floating point registers.
I was curious about the code generation changes, so I built power8 versions of the Spec 2006 benchmark suite, and compared the number of instructions generated, using the same options. Most of the floating point benchmarks had some changes in code generation, including fewer scalar floating loads/stores (where the RA picked a traditional scalar register, which meant elsewere a scalar was spilled to the stack), and different encodings of the FMA instructions. I did a run of the FP spec benchmarks on a big endian power8 system. There were no regressions that were significant, and the cactusADM benchmark sped up by 2%. I did a bootstrap/make check comparison, and there were no regressions. Is it ok to install in trunk and the active PowerPC branches? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 214455) +++ gcc/config/rs6000/vsx.md (working copy) @@ -905,11 +905,11 @@ (define_insn "*vsx_tsqrt<mode>2_internal ;; multiply. (define_insn "*vsx_fmav4sf4" - [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,v") + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,wf,?wa,?wa,v") (fma:V4SF - (match_operand:V4SF 1 "vsx_register_operand" "%ws,ws,wa,wa,v") - (match_operand:V4SF 2 "vsx_register_operand" "ws,0,wa,0,v") - (match_operand:V4SF 3 "vsx_register_operand" "0,ws,0,wa,v")))] + (match_operand:V4SF 1 "vsx_register_operand" "%wf,wf,wa,wa,v") + (match_operand:V4SF 2 "vsx_register_operand" "wf,0,wa,0,v") + (match_operand:V4SF 3 "vsx_register_operand" "0,wf,0,wa,v")))] "VECTOR_UNIT_VSX_P (V4SFmode)" "@ xvmaddasp %x0,%x1,%x2 @@ -920,11 +920,11 @@ (define_insn "*vsx_fmav4sf4" [(set_attr "type" "vecfloat")]) (define_insn "*vsx_fmav2df4" - [(set (match_operand:V2DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa") + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wd,wd,?wa,?wa") (fma:V2DF - (match_operand:V2DF 1 "vsx_register_operand" "%ws,ws,wa,wa") - (match_operand:V2DF 2 "vsx_register_operand" "ws,0,wa,0") - (match_operand:V2DF 3 "vsx_register_operand" "0,ws,0,wa")))] + (match_operand:V2DF 1 "vsx_register_operand" "%wd,wd,wa,wa") + (match_operand:V2DF 2 "vsx_register_operand" "wd,0,wa,0") + (match_operand:V2DF 3 "vsx_register_operand" "0,wd,0,wa")))] "VECTOR_UNIT_VSX_P (V2DFmode)" "@ xvmaddadp %x0,%x1,%x2 @@ -1360,8 +1360,8 @@ (define_insn "*vsx_float_fix_<mode>2" (define_insn "vsx_concat_<mode>" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>") (vec_concat:VSX_D - (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,<VSa>") - (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,<VSa>")))] + (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>") + (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))] "VECTOR_MEM_VSX_P (<MODE>mode)" { if (BYTES_BIG_ENDIAN) @@ -2018,7 +2018,7 @@ (define_insn_and_split "*vsx_reduc_<VEC_ ;; to the top element of the V2DF array without doing an extract. (define_insn_and_split "*vsx_reduc_<VEC_reduc_name>_v2df_scalar" - [(set (match_operand:DF 0 "vfloat_operand" "=&ws,&?wa,ws,?wa") + [(set (match_operand:DF 0 "vfloat_operand" "=&ws,&?ws,ws,?ws") (vec_select:DF (VEC_reduc:V2DF (vec_concat:V2DF