On Tue, Oct 1, 2013 at 1:52 PM, Michael Meissner
<meiss...@linux.vnet.ibm.com> wrote:
> This patch moves most of the VSX DFmode operations from vsx.md to rs6000.md to
> use the traditional floating point instructions (f*) instead of the VSX scalar
> instructions (xs*) if all of the registers come from the traditional floating
> point register set.  The add, subtract, multiply, divide, reciprocal estimate,
> square root, absolute value, negate, round functions, and multiply/add
> instructions were changed.  Some of the converts have not been changed with
> these patches.  If the -mupper-regs-df switch is used, it will attempt to use
> the upper registers (those that overlay on the traditional Altivec register
> set).
>
> This patch also combines the scalar SFmode/DFmode support on non-SPE systems.
> It adds in ISA 2.07 (power8) single precision floating point support if the
> -mupper-regs-sf switch is used.
>
> At present, neither -mupper-regs-df nor -mupper-regs-sf is usable if reload 
> has
> to do anything.  A future patch will address this.
>
> I did need to adjust a few tests that were specifically testing VSX scalar 
> code
> generation.  In addition, I put in a simple test to make sure the initial
> -mupper-regs-df and -mupper-regs-sf works correctly.
>
> I tested this an except for power7, power8 I could not find any changes in 
> code
> generated for power4, power5, power6, power6x, G4, G5, cell, e5500, e6500,
> xilinx (sp_full, sp_lite, dp_full, dp_lite, none), 8548/8540 (spe), 750cl
> (paired floating point).
>
> For VSX systems there is code generation differences:
>
>     1)  The traditional fp instruction is generated instead of VSX;
>
>     2)  Because of #1, the code generator favors the 4 operand of multiply/add
>         instructions, where the target register does not overlap with any of
>         the inputs over the VSX version that that requires overlap.
>
>     3)  A few of the vectorized tests on power8 now generate more direct move
>         instructions, instead of moving values through the stack than
>         previously.  These tests are integer tests, where you are doing an
>         operation between an integer vector and a scalar value.  Previously in
>         some cases, the register allocator would store the value from a GPR 
> and
>         reload it to the vector registers.
>
>     4)  There is a slight scheduling difference in doing long double abs, that
>         causes a different register to be used.  The code for long double abs
>         needs to be improved in any case (the early splitting is causing 
> spills
>         to the stack).
>
> I had no differences in doing bootstrap and make check (with the testsuite
> fixes applied).
>
> In addition, I am running Spec 2006 floating point tests on a power7 box to
> compare the effects of going back to the traditional floating point tests.  
> For
> most tests, there is less than 2% difference between the runs.  One benchmark
> (482.sphinx3) is slightly faster with these changes, and it is dominated by
> floating point multiply/add operations.
>
> Can I apply these patches?
>
> [gcc]
> 2013-09-30  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000-builtin.def (XSRDPIM): Use floatdf2,
>         ceildf2, btruncdf2, instead of vsx_* name.
>
>         * config/rs6000/vsx.md (vsx_add<mode>3): Change arithmetic
>         iterators to only do V2DF and V4SF here.  Move the DF code to
>         rs6000.md where it is combined with SF mode.  Replace <VSv> with
>         just 'v' since only vector operations are handled with these insns
>         after moving the DF support to rs6000.md.
>         (vsx_sub<mode>3): Likewise.
>         (vsx_mul<mode>3): Likewise.
>         (vsx_div<mode>3): Likewise.
>         (vsx_fre<mode>2): Likewise.
>         (vsx_neg<mode>2): Likewise.
>         (vsx_abs<mode>2): Likewise.
>         (vsx_nabs<mode>2): Likewise.
>         (vsx_smax<mode>3): Likewise.
>         (vsx_smin<mode>3): Likewise.
>         (vsx_sqrt<mode>2): Likewise.
>         (vsx_rsqrte<mode>2): Likewise.
>         (vsx_fms<mode>4): Likewise.
>         (vsx_nfma<mode>4): Likewise.
>         (vsx_copysign<mode>3): Likewise.
>         (vsx_btrunc<mode>2): Likewise.
>         (vsx_floor<mode>2): Likewise.
>         (vsx_ceil<mode>2): Likewise.
>         (vsx_smaxsf3): Delete scalar ops that were moved to rs6000.md.
>         (vsx_sminsf3): Likewise.
>         (vsx_fmadf4): Likewise.
>         (vsx_fmsdf4): Likewise.
>         (vsx_nfmadf4): Likewise.
>         (vsx_nfmsdf4): Likewise.
>         (vsx_cmpdf_internal1): Likewise.
>
>         * config/rs6000/rs6000.h (TARGET_SF_SPE): Define macros to make it
>         simpler to select whether a target has SPE or traditional floating
>         point support in iterators.
>         (TARGET_DF_SPE): Likewise.
>         (TARGET_SF_FPR): Likewise.
>         (TARGET_DF_FPR): Likewise.
>         (TARGET_SF_INSN): Macros to say whether floating point support
>         exists for a given operation for expanders.
>         (TARGET_DF_INSN): Likewise.
>
>         * config/rs6000/rs6000.c (Ftrad): New mode attributes to allow
>         combining of SF/DF mode operations, using both traditional and VSX
>         registers.
>         (Fvsx): Likewise.
>         (Ff): Likewise.
>         (Fv): Likewise.
>         (Fs): Likewise.
>         (Ffre): Likewise.
>         (FFRE): Likewise.
>         (abs<mode>2): Combine SF/DF modes using traditional floating point
>         instructions.  Add support for using the upper DF registers with
>         VSX support, and SF registers with power8-vector support.  Update
>         expanders for operations supported by both the SPE and traditional
>         floating point units.
>         (abs<mode>2_fpr): Likewise.
>         (nabs<mode>2): Likewise.
>         (nabs<mode>2_fpr): Likewise.
>         (neg<mode>2): Likewise.
>         (neg<mode>2_fpr): Likewise.
>         (add<mode>3): Likewise.
>         (add<mode>3_fpr): Likewise.
>         (sub<mode>3): Likewise.
>         (sub<mode>3_fpr): Likewise.
>         (mul<mode>3): Likewise.
>         (mul<mode>3_fpr): Likewise.
>         (div<mode>3): Likewise.
>         (div<mode>3_fpr): Likewise.
>         (sqrt<mode>3): Likewise.
>         (sqrt<mode>3_fpr): Likewise.
>         (fre<Fs>): Likewise.
>         (rsqrt<mode>2): Likewise.
>         (cmp<mode>_fpr): Likewise.
>         (smax<mode>3): Likewise.
>         (smin<mode>3): Likewise.
>         (smax<mode>3_vsx): Likewise.
>         (smin<mode>3_vsx): Likewise.
>         (negsf2): Delete SF operations that are merged with DF.
>         (abssf2): Likewise.
>         (addsf3): Likewise.
>         (subsf3): Likewise.
>         (mulsf3): Likewise.
>         (divsf3): Likewise.
>         (fres): Likewise.
>         (fmasf4_fpr): Likewise.
>         (fmssf4_fpr): Likewise.
>         (nfmasf4_fpr): Likewise.
>         (nfmssf4_fpr): Likewise.
>         (sqrtsf2): Likewise.
>         (rsqrtsf_internal1): Likewise.
>         (smaxsf3): Likewise.
>         (sminsf3): Likewise.
>         (cmpsf_internal1): Likewise.
>         (copysign<mode>3_fcpsgn): Add VSX/power8-vector support.
>         (negdf2): Delete DF operations that are merged with SF.
>         (absdf2): Likewise.
>         (nabsdf2): Likewise.
>         (adddf3): Likewise.
>         (subdf3): Likewise.
>         (muldf3): Likewise.
>         (divdf3): Likewise.
>         (fred): Likewise.
>         (rsqrtdf_internal1): Likewise.
>         (fmadf4_fpr): Likewise.
>         (fmsdf4_fpr): Likewise.
>         (nfmadf4_fpr): Likewise.
>         (nfmsdf4_fpr): Likewise.
>         (sqrtdf2): Likewise.
>         (smaxdf3): Likewise.
>         (smindf3): Likewise.
>         (cmpdf_internal1): Likewise.
>         (lrint<mode>di2): Use TARGET_<MODE>_FPR macro.
>         (btrunc<mode>2): Delete separate expander, and combine with the
>         insn and add VSX instruction support.  Use TARGET_<MODE>_FPR.
>         (btrunc<mode>2_fpr): Likewise.
>         (ceil<mode>2): Likewise.
>         (ceil<mode>2_fpr): Likewise.
>         (floor<mode>2): Likewise.
>         (floor<mode>2_fpr): Likewise.
>         (fma<mode>4_fpr): Combine SF and DF fused multiply/add support.
>         Add support for using the upper registers with VSX and
>         power8-vector.  Move insns to be closer to the define_expands. On
>         VSX systems, prefer the traditional form of FMA over the VSX
>         version, since the traditional form allows the target not to
>         overlap with the inputs.
>         (fms<mode>4_fpr): Likewise.
>         (nfma<mode>4_fpr): Likewise.
>         (nfms<mode>4_fpr): Likewise.
>
> [gcc/testsuite]
> 2013-09-30  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/p8vector-fp.c: New test for floating point
>         scalar operations when using -mupper-regs-sf and -mupper-regs-df.
>         * gcc.target/powerpc/ppc-target-1.c: Update tests to allow either
>         VSX scalar operations or the traditional floating point form of
>         the instruction.
>         * gcc.target/powerpc/ppc-target-2.c: Likewise.
>         * gcc.target/powerpc/recip-3.c: Likewise.
>         * gcc.target/powerpc/recip-5.c: Likewise.
>         * gcc.target/powerpc/pr72747.c: Likewise.
>         * gcc.target/powerpc/vsx-builtin-3.c: Likewise.

Okay.  Good cleanups.

Thanks, David

Reply via email to