On Tue, Oct 1, 2013 at 1:52 PM, Michael Meissner <meiss...@linux.vnet.ibm.com> wrote: > This patch moves most of the VSX DFmode operations from vsx.md to rs6000.md to > use the traditional floating point instructions (f*) instead of the VSX scalar > instructions (xs*) if all of the registers come from the traditional floating > point register set. The add, subtract, multiply, divide, reciprocal estimate, > square root, absolute value, negate, round functions, and multiply/add > instructions were changed. Some of the converts have not been changed with > these patches. If the -mupper-regs-df switch is used, it will attempt to use > the upper registers (those that overlay on the traditional Altivec register > set). > > This patch also combines the scalar SFmode/DFmode support on non-SPE systems. > It adds in ISA 2.07 (power8) single precision floating point support if the > -mupper-regs-sf switch is used. > > At present, neither -mupper-regs-df nor -mupper-regs-sf is usable if reload > has > to do anything. A future patch will address this. > > I did need to adjust a few tests that were specifically testing VSX scalar > code > generation. In addition, I put in a simple test to make sure the initial > -mupper-regs-df and -mupper-regs-sf works correctly. > > I tested this an except for power7, power8 I could not find any changes in > code > generated for power4, power5, power6, power6x, G4, G5, cell, e5500, e6500, > xilinx (sp_full, sp_lite, dp_full, dp_lite, none), 8548/8540 (spe), 750cl > (paired floating point). > > For VSX systems there is code generation differences: > > 1) The traditional fp instruction is generated instead of VSX; > > 2) Because of #1, the code generator favors the 4 operand of multiply/add > instructions, where the target register does not overlap with any of > the inputs over the VSX version that that requires overlap. > > 3) A few of the vectorized tests on power8 now generate more direct move > instructions, instead of moving values through the stack than > previously. These tests are integer tests, where you are doing an > operation between an integer vector and a scalar value. Previously in > some cases, the register allocator would store the value from a GPR > and > reload it to the vector registers. > > 4) There is a slight scheduling difference in doing long double abs, that > causes a different register to be used. The code for long double abs > needs to be improved in any case (the early splitting is causing > spills > to the stack). > > I had no differences in doing bootstrap and make check (with the testsuite > fixes applied). > > In addition, I am running Spec 2006 floating point tests on a power7 box to > compare the effects of going back to the traditional floating point tests. > For > most tests, there is less than 2% difference between the runs. One benchmark > (482.sphinx3) is slightly faster with these changes, and it is dominated by > floating point multiply/add operations. > > Can I apply these patches? > > [gcc] > 2013-09-30 Michael Meissner <meiss...@linux.vnet.ibm.com> > > * config/rs6000/rs6000-builtin.def (XSRDPIM): Use floatdf2, > ceildf2, btruncdf2, instead of vsx_* name. > > * config/rs6000/vsx.md (vsx_add<mode>3): Change arithmetic > iterators to only do V2DF and V4SF here. Move the DF code to > rs6000.md where it is combined with SF mode. Replace <VSv> with > just 'v' since only vector operations are handled with these insns > after moving the DF support to rs6000.md. > (vsx_sub<mode>3): Likewise. > (vsx_mul<mode>3): Likewise. > (vsx_div<mode>3): Likewise. > (vsx_fre<mode>2): Likewise. > (vsx_neg<mode>2): Likewise. > (vsx_abs<mode>2): Likewise. > (vsx_nabs<mode>2): Likewise. > (vsx_smax<mode>3): Likewise. > (vsx_smin<mode>3): Likewise. > (vsx_sqrt<mode>2): Likewise. > (vsx_rsqrte<mode>2): Likewise. > (vsx_fms<mode>4): Likewise. > (vsx_nfma<mode>4): Likewise. > (vsx_copysign<mode>3): Likewise. > (vsx_btrunc<mode>2): Likewise. > (vsx_floor<mode>2): Likewise. > (vsx_ceil<mode>2): Likewise. > (vsx_smaxsf3): Delete scalar ops that were moved to rs6000.md. > (vsx_sminsf3): Likewise. > (vsx_fmadf4): Likewise. > (vsx_fmsdf4): Likewise. > (vsx_nfmadf4): Likewise. > (vsx_nfmsdf4): Likewise. > (vsx_cmpdf_internal1): Likewise. > > * config/rs6000/rs6000.h (TARGET_SF_SPE): Define macros to make it > simpler to select whether a target has SPE or traditional floating > point support in iterators. > (TARGET_DF_SPE): Likewise. > (TARGET_SF_FPR): Likewise. > (TARGET_DF_FPR): Likewise. > (TARGET_SF_INSN): Macros to say whether floating point support > exists for a given operation for expanders. > (TARGET_DF_INSN): Likewise. > > * config/rs6000/rs6000.c (Ftrad): New mode attributes to allow > combining of SF/DF mode operations, using both traditional and VSX > registers. > (Fvsx): Likewise. > (Ff): Likewise. > (Fv): Likewise. > (Fs): Likewise. > (Ffre): Likewise. > (FFRE): Likewise. > (abs<mode>2): Combine SF/DF modes using traditional floating point > instructions. Add support for using the upper DF registers with > VSX support, and SF registers with power8-vector support. Update > expanders for operations supported by both the SPE and traditional > floating point units. > (abs<mode>2_fpr): Likewise. > (nabs<mode>2): Likewise. > (nabs<mode>2_fpr): Likewise. > (neg<mode>2): Likewise. > (neg<mode>2_fpr): Likewise. > (add<mode>3): Likewise. > (add<mode>3_fpr): Likewise. > (sub<mode>3): Likewise. > (sub<mode>3_fpr): Likewise. > (mul<mode>3): Likewise. > (mul<mode>3_fpr): Likewise. > (div<mode>3): Likewise. > (div<mode>3_fpr): Likewise. > (sqrt<mode>3): Likewise. > (sqrt<mode>3_fpr): Likewise. > (fre<Fs>): Likewise. > (rsqrt<mode>2): Likewise. > (cmp<mode>_fpr): Likewise. > (smax<mode>3): Likewise. > (smin<mode>3): Likewise. > (smax<mode>3_vsx): Likewise. > (smin<mode>3_vsx): Likewise. > (negsf2): Delete SF operations that are merged with DF. > (abssf2): Likewise. > (addsf3): Likewise. > (subsf3): Likewise. > (mulsf3): Likewise. > (divsf3): Likewise. > (fres): Likewise. > (fmasf4_fpr): Likewise. > (fmssf4_fpr): Likewise. > (nfmasf4_fpr): Likewise. > (nfmssf4_fpr): Likewise. > (sqrtsf2): Likewise. > (rsqrtsf_internal1): Likewise. > (smaxsf3): Likewise. > (sminsf3): Likewise. > (cmpsf_internal1): Likewise. > (copysign<mode>3_fcpsgn): Add VSX/power8-vector support. > (negdf2): Delete DF operations that are merged with SF. > (absdf2): Likewise. > (nabsdf2): Likewise. > (adddf3): Likewise. > (subdf3): Likewise. > (muldf3): Likewise. > (divdf3): Likewise. > (fred): Likewise. > (rsqrtdf_internal1): Likewise. > (fmadf4_fpr): Likewise. > (fmsdf4_fpr): Likewise. > (nfmadf4_fpr): Likewise. > (nfmsdf4_fpr): Likewise. > (sqrtdf2): Likewise. > (smaxdf3): Likewise. > (smindf3): Likewise. > (cmpdf_internal1): Likewise. > (lrint<mode>di2): Use TARGET_<MODE>_FPR macro. > (btrunc<mode>2): Delete separate expander, and combine with the > insn and add VSX instruction support. Use TARGET_<MODE>_FPR. > (btrunc<mode>2_fpr): Likewise. > (ceil<mode>2): Likewise. > (ceil<mode>2_fpr): Likewise. > (floor<mode>2): Likewise. > (floor<mode>2_fpr): Likewise. > (fma<mode>4_fpr): Combine SF and DF fused multiply/add support. > Add support for using the upper registers with VSX and > power8-vector. Move insns to be closer to the define_expands. On > VSX systems, prefer the traditional form of FMA over the VSX > version, since the traditional form allows the target not to > overlap with the inputs. > (fms<mode>4_fpr): Likewise. > (nfma<mode>4_fpr): Likewise. > (nfms<mode>4_fpr): Likewise. > > [gcc/testsuite] > 2013-09-30 Michael Meissner <meiss...@linux.vnet.ibm.com> > > * gcc.target/powerpc/p8vector-fp.c: New test for floating point > scalar operations when using -mupper-regs-sf and -mupper-regs-df. > * gcc.target/powerpc/ppc-target-1.c: Update tests to allow either > VSX scalar operations or the traditional floating point form of > the instruction. > * gcc.target/powerpc/ppc-target-2.c: Likewise. > * gcc.target/powerpc/recip-3.c: Likewise. > * gcc.target/powerpc/recip-5.c: Likewise. > * gcc.target/powerpc/pr72747.c: Likewise. > * gcc.target/powerpc/vsx-builtin-3.c: Likewise.
Okay. Good cleanups. Thanks, David