Hi, If I understand correct, current implementation replaces
fdiv fsqrt by frsqrte for i=0 to 3 fmul frsqrts fmul So I think gains depends latency of frsqrts insn. I see patch has patterns for vector versions of frsqrts, but does not enable them? Regards, Venkat. > -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich > Sent: Wednesday, June 24, 2015 10:22 PM > To: Evandro Menezes > Cc: Benedikt Huber; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) > estimation in -ffast-math > > Evandro, > > We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal > sqrt. > > Also, the “reciprocal divide” patches are floating around in various of our > git- > tree, but aren’t ready for public consumption, yet… I’ll leave Benedikt to > comment on potential timelines for getting that pushed out. > > Best, > Philipp. > > > On 24 Jun 2015, at 18:42, Evandro Menezes <e.mene...@samsung.com> > wrote: > > > > Benedikt, > > > > You beat me to it! :-) Do you have the implementation for dividing > > using the Newton series as well? > > > > I'm not sure that the series is always for all data types and on all > > processors. It would be useful to allow each AArch64 processor to > > enable this or not depending on the data type. BTW, do you have some > > tests showing the speed up? > > > > Thank you, > > > > -- > > Evandro Menezes Austin, TX > > > >> -----Original Message----- > >> From: gcc-patches-ow...@gcc.gnu.org > >> [mailto:gcc-patches-ow...@gcc.gnu.org] > > On > >> Behalf Of Benedikt Huber > >> Sent: Thursday, June 18, 2015 7:04 > >> To: gcc-patches@gcc.gnu.org > >> Cc: benedikt.hu...@theobroma-systems.com; > philipp.tomsich@theobroma- > >> systems.com > >> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) > >> estimation in -ffast-math > >> > >> arch64 offers the instructions frsqrte and frsqrts, for rsqrt > >> estimation > > and > >> a Newton-Raphson step, respectively. > >> There are ARMv8 implementations where this is faster than using fdiv > >> and rsqrt. > >> It runs three steps for double and two steps for float to achieve the > > needed > >> precision. > >> > >> There is one caveat and open question. > >> Since -ffast-math enables flush to zero intermediate values between > >> approximation steps will be flushed to zero if they are denormal. > >> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). > >> The test cases pass, but it is unclear to me whether this is expected > >> behavior with -ffast-math. > >> > >> The patch applies to commit: > >> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 > >> > >> Please consider including this patch. > >> Thank you and best regards, > >> Benedikt Huber > >> > >> Benedikt Huber (1): > >> 2015-06-15 Benedikt Huber <benedikt.huber@theobroma- > systems.com> > >> > >> gcc/ChangeLog | 9 +++ > >> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ > >> gcc/config/aarch64/aarch64-protos.h | 2 + > >> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ > >> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ > >> gcc/config/aarch64/aarch64.md | 3 + > >> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 > >> +++++++++++++++++++++++++++++++ > >> 7 files changed, 277 insertions(+) > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c > >> > >> -- > >> 1.9.1 > > <Mail Attachment.eml>