Benedikt, Are you developing the reciprocal approximation just for 1/x proper or for any division, as in x/y = x * 1/y?
Thank you, -- Evandro Menezes Austin, TX > -----Original Message----- > From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com] > Sent: Wednesday, June 24, 2015 12:11 > To: Dr. Philipp Tomsich > Cc: Evandro Menezes; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) > estimation in -ffast-math > > Evandro, > > Yes, we also have the 1/x approximation. > However we do not have the test cases yet, and it also would need some clean > up. > I am going to provide a patch for that soon (say next week). > Also, for this optimization we have *not* yet found a benchmark with > significant improvements. > > Best Regards, > Benedikt > > > > On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich <philipp.tomsich@theobroma- > systems.com> wrote: > > > > Evandro, > > > > We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal > sqrt. > > > > Also, the “reciprocal divide” patches are floating around in various > > of our git-tree, but aren’t ready for public consumption, yet… I’ll > > leave Benedikt to comment on potential timelines for getting that pushed > out. > > > > Best, > > Philipp. > > > >> On 24 Jun 2015, at 18:42, Evandro Menezes <e.mene...@samsung.com> wrote: > >> > >> Benedikt, > >> > >> You beat me to it! :-) Do you have the implementation for dividing > >> using the Newton series as well? > >> > >> I'm not sure that the series is always for all data types and on all > >> processors. It would be useful to allow each AArch64 processor to > >> enable this or not depending on the data type. BTW, do you have some > >> tests showing the speed up? > >> > >> Thank you, > >> > >> -- > >> Evandro Menezes Austin, TX > >> > >>> -----Original Message----- > >>> From: gcc-patches-ow...@gcc.gnu.org > >>> [mailto:gcc-patches-ow...@gcc.gnu.org] > >> On > >>> Behalf Of Benedikt Huber > >>> Sent: Thursday, June 18, 2015 7:04 > >>> To: gcc-patches@gcc.gnu.org > >>> Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma- > >>> systems.com > >>> Subject: [PATCH] [aarch64] Implemented reciprocal square root > >>> (rsqrt) estimation in -ffast-math > >>> > >>> arch64 offers the instructions frsqrte and frsqrts, for rsqrt > >>> estimation > >> and > >>> a Newton-Raphson step, respectively. > >>> There are ARMv8 implementations where this is faster than using fdiv > >>> and rsqrt. > >>> It runs three steps for double and two steps for float to achieve > >>> the > >> needed > >>> precision. > >>> > >>> There is one caveat and open question. > >>> Since -ffast-math enables flush to zero intermediate values between > >>> approximation steps will be flushed to zero if they are denormal. > >>> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX). > >>> The test cases pass, but it is unclear to me whether this is > >>> expected behavior with -ffast-math. > >>> > >>> The patch applies to commit: > >>> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470 > >>> > >>> Please consider including this patch. > >>> Thank you and best regards, > >>> Benedikt Huber > >>> > >>> Benedikt Huber (1): > >>> 2015-06-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> > >>> > >>> gcc/ChangeLog | 9 +++ > >>> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++ > >>> gcc/config/aarch64/aarch64-protos.h | 2 + > >>> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++ > >>> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++ > >>> gcc/config/aarch64/aarch64.md | 3 + > >>> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 > >>> +++++++++++++++++++++++++++++++ > >>> 7 files changed, 277 insertions(+) > >>> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c > >>> > >>> -- > >>> 1.9.1 > >> <Mail Attachment.eml> > >