On 03/10/16 19:06, Wilco Dijkstra wrote:
Evandro Menezes <e.mene...@samsung.com> wrote:
That's what I had in mind too, but around the approximation for x^-1/2
and using masks for vector cases thusly:
fcmne v3.4s, v0.4s, #0.0
frsqrte v1.4s, v0.4s
fmul v2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmul v1.4s, v1.4s, v2.4s
fmul v2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmul v1.4s, v1.4s, v2.4s
and v1.4s, v3.4s
fmul v0.4s, v1.4s, v0.4s
That's possible but the overall latency is higher - according to exynos-1.md the
above takes 44 cycles while my version would be 37.
I'm currently working to get this prototyped without modifying the
reciprocal square root. Once I'm done, I'll merge both functions
together to generate better code.
I got the scalar version going, but I'm stuck with the vector version.
As you can see above, I need to use the complement of the mask produced
by FCMEQ to squelch the offending vector element. However, the way in
which FCMEQ is defined in GCC, it produces an integer vector and the
SIMD AND only takes integer vectors. I'm stuck at how to pass an FP
vector to AND and then its integer vector back to an FP insn.
Here's how the function stands at the moment:
void
aarch64_emit_approx_sqrt (rtx dst, rtx src)
{
machine_mode mode = GET_MODE (src);
gcc_assert (GET_MODE_INNER (mode) == SFmode
|| GET_MODE_INNER (mode) == DFmode);
bool scalar = !VECTOR_MODE_P (mode);
bool narrow = (mode == V2SFmode);
rtx xsrc = gen_reg_rtx (mode);
emit_move_insn (xsrc, src);
rtx xcc, xne, xmsk;
if (scalar)
{
/* fcmp */
xcc = aarch64_gen_compare_reg (NE, xsrc, CONST0_RTX (mode));
xne = gen_rtx_NE (VOIDmode, xcc, const0_rtx);
}
else
{
machine_mode mcmp = mode_for_vector (int_mode_for_mode
(GET_MODE_INNER (mode)), GET_MODE_NUNITS (mode));
/* fcmne */
xmsk = gen_reg_rtx (mode);
/* Just V4SF for now */
emit_insn (gen_aarch64_cmeqv4sf (xmsk, xsrc, CONST0_RTX (mode)));
/* TODO: must use the complement of the this result. */
}
/* Calculate the approximate reciprocal square root. */
rtx xrsqrt = gen_reg_rtx (mode);
aarch64_emit_approx_rsqrt (xrsqrt, xsrc);
/* Calculate the approximate square root. */
rtx xsqrt = gen_reg_rtx (mode);
emit_set_insn (xsqrt, gen_rtx_MULT (mode, xrsqrt, xsrc));
/* Qualify the result for when the input is zero. */
rtx xdst = gen_reg_rtx (mode);
if (scalar)
/* fcsel */
emit_set_insn (xdst, gen_rtx_IF_THEN_ELSE (mode, xne, xsqrt,
xsrc));
else
/* and */
emit_set_insn (xdst, gen_rtx_AND (mode, xsqrt, xmsk));
emit_move_insn (dst, xdst);
}
Any help is welcome.
Thank you,
--
Evandro Menezes