Kumar, what is the relative gain that you see on Cortex-A57?
Thanks, Philipp. > On 25 Jun 2015, at 17:35, Kumar, Venkataramanan > <venkataramanan.ku...@amd.com> wrote: > > Changing to "1 step for float" and "2 steps for double" gives better gains > now for gromacs on cortex-a57. > > Regards, > Venkat. >> -----Original Message----- >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >> ow...@gcc.gnu.org] On Behalf Of Benedikt Huber >> Sent: Thursday, June 25, 2015 4:09 PM >> To: pins...@gmail.com >> Cc: gcc-patches@gcc.gnu.org; philipp.toms...@theobroma-systems.com >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) >> estimation in -ffast-math >> >> Andrew, >> >>> This is NOT a win on thunderX at least for single precision because you have >> to do the divide and sqrt in the same time as it takes 5 multiples (estimate >> and step are multiplies in the thunderX pipeline). Doubles is 10 multiplies >> which is just the same as what the patch does (but it is really slightly >> less than >> 10, I rounded up). So in the end this is NOT a win at all for thunderX unless >> we do one less step for both single and double. >> >> Yes, the expected benefit from rsqrt estimation is implementation specific. >> If >> one has a better initial rsqrte or an application that can trade precision >> for >> execution time, we could offer a command line option to do only 2 steps for >> doulbe and 1 step for float; similar to -mrecip-precision for PowerPC. >> What are your thoughts on that? >> >> Best regards, >> Benedikt