Hi!
On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> The Altivec instructions fmaddfp and fnmsubfp have different rounding
> behaviors
Those are not existing instructions. You mean "vmaddfp" etc.
> than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating
> these instructions seems to break Eigen.
Those instructions use round-to-nearest-tiea-to-even, like all other
VMX FP insns. A proper patch has to deal with all VMX FP insns. But,
almost all programs expect that rounding mode anyway, so this is not a
problem in practice. What happened on Eigen is that the Linux kernel
starts every new process with VSCR[NJ]=1, breaking pretty much
everything that wants floating point for non-toy purposes. (There
currently is a bug on LE that sets the wrong bit, hiding the problem in
that configuration, but it is intended there as well).
> GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
> as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions. The
> advantage of the Altivec instructions is that they are 4 operand instructions
> (i.e. the target register does not have to overlap with one of the input
> registers). The advantage is it can eliminate an extra move instruction. The
> disadvantage is it does round the same was as the VSX instructions.
And it gets the VSCR[NJ] setting applied. Yup.
> This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> instructions as alternatives in the VSX instruction insn support, and in the
> Altivec insns it adds a test to prevent the insn from being used if VSX is
> available. I also added a test to the regression test suite.
Please leave the latter out, it does not belong in this patch. If you
want a patch to do that deal with *all* VMX FP insns? There also are
add, sub, mul, etc. Well I think those (as well as madd and nmsub) are
the only ones that use the NJ bit or the RN bits, but please check.
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
>
> ;; Fused multiply add.
>
> +;; If we are using VSX instructions, do not generate the vmaddfp instruction
> +;; since is has different rounding behavior than the xvmaddsp instruction.
> +
No blank lines please.
> (define_insn "*altivec_fmav4sf4"
> [(set (match_operand:V4SF 0 "register_operand" "=v")
> (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> (match_operand:V4SF 2 "register_operand" "v")
> (match_operand:V4SF 3 "register_operand" "v")))]
> - "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> + "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
This is very error-prone. Maybe add a test to the VECTOR_UNIT_ALTIVEC
macro instead?
> -;; Fused vector multiply/add instructions. Support the classical Altivec
> -;; versions of fma, which allows the target to be a separate register from
> the
> -;; 3 inputs. Under VSX, the target must be either the addend or the first
> -;; multiply.
> +;; Fused vector multiply/add instructions. Do not use the classical Altivec
(Two spaces after dot, and AltiVec is spelled with a capital V. I don't
like it either, VMX is a much nicer and more regular name).
> +;; versions of fma. Those instructions allows the target to be a separate
> +;; register from the 3 inputs, but they have different rounding behaviors.
>
> (define_insn "*vsx_fmav4sf4"
> - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> (fma:V4SF
> - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> + (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
> "VECTOR_UNIT_VSX_P (V4SFmode)"
> "@
> xvmaddasp %x0,%x1,%x2
> - xvmaddmsp %x0,%x1,%x3
> - vmaddfp %0,%1,%2,%3"
> + xvmaddmsp %x0,%x1,%x3"
> [(set_attr "type" "vecfloat")])
So this part looks okay, and it alone is safe for GCC 13 as well.
> (define_insn "*vsx_nfmsv4sf4"
> - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> (neg:V4SF
> (fma:V4SF
> - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> + (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> + (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> (neg:V4SF
> - (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
> + (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
> "VECTOR_UNIT_VSX_P (V4SFmode)"
> "@
> xvnmsubasp %x0,%x1,%x2
> - xvnmsubmsp %x0,%x1,%x3
> - vnmsubfp %0,%1,%2,%3"
> + xvnmsubmsp %x0,%x1,%x3"
> [(set_attr "type" "vecfloat")])
Well, together with this of course :-)
Could you please do that?
Segher