Kumar,

This does not come unexpected, as the initial estimation and each iteration 
will add an architecturally-defined number of bits of precision (ARMv8 
guarantuees only a minimum number of bits provided per operation… the exact 
number is specific to each micro-arch, though).
Depending on your architecture and on the required number of precise bits by 
any given benchmark, one may see miscompares.

Do you know the exact number of bits that the initial estimate and the 
subsequent refinement steps add for your micro-arch?

Thanks,
Philipp.

> On 29 Jun 2015, at 10:17, Kumar, Venkataramanan 
> <venkataramanan.ku...@amd.com> wrote:
> 
> 
> Hmm,  Reducing the iterations to "1 step for float" and "2 steps for double"
> 
> I got VE (miscompares) on following benchmarks
> 416.gamess 
> 453.povray         
> 454.calculix   
> 459.GemsFDTD  
> 
> Benedikt , I have ICE for 444.namd with your patch,  not sure if something 
> wrong in my local tree.  
> 
> Regards,
> Venkat.
> 
>> -----Original Message-----
>> From: pins...@gmail.com [mailto:pins...@gmail.com]
>> Sent: Sunday, June 28, 2015 8:35 PM
>> To: Kumar, Venkataramanan
>> Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>> estimation in -ffast-math
>> 
>> 
>> 
>> 
>> 
>>> On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
>> <venkataramanan.ku...@amd.com> wrote:
>>> 
>>> I got around ~12% gain with -Ofast -mcpu=cortex-a57.
>> 
>> I get around 11/12% on thunderX with the patch and the decreasing the
>> iterations change (1/2) compared to without the patch.
>> 
>> Thanks,
>> Andrew
>> 
>> 
>>> 
>>> Regards,
>>> Venkat.
>>> 
>>>> -----Original Message-----
>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>>> ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
>>>> Sent: Thursday, June 25, 2015 9:13 PM
>>>> To: Kumar, Venkataramanan
>>>> Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org
>>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
>>>> (rsqrt) estimation in -ffast-math
>>>> 
>>>> Kumar,
>>>> 
>>>> what is the relative gain that you see on Cortex-A57?
>>>> 
>>>> Thanks,
>>>> Philipp.
>>>> 
>>>>>> On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
>>>>> <venkataramanan.ku...@amd.com> wrote:
>>>>> 
>>>>> Changing to  "1 step for float" and "2 steps for double" gives
>>>>> better gains
>>>> now for gromacs on cortex-a57.
>>>>> 
>>>>> Regards,
>>>>> Venkat.
>>>>>> -----Original Message-----
>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>>>>> ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
>>>>>> Sent: Thursday, June 25, 2015 4:09 PM
>>>>>> To: pins...@gmail.com
>>>>>> Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
>> systems.com
>>>>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
>>>>>> (rsqrt) estimation in -ffast-math
>>>>>> 
>>>>>> Andrew,
>>>>>> 
>>>>>>> This is NOT a win on thunderX at least for single precision
>>>>>>> because you have
>>>>>> to do the divide and sqrt in the same time as it takes 5 multiples
>>>>>> (estimate and step are multiplies in the thunderX pipeline).
>>>>>> Doubles is 10 multiplies which is just the same as what the patch
>>>>>> does (but it is really slightly less than 10, I rounded up). So in
>>>>>> the end this is NOT a win at all for thunderX unless we do one less
>>>>>> step for both single
>>>> and double.
>>>>>> 
>>>>>> Yes, the expected benefit from rsqrt estimation is implementation
>>>>>> specific. If one has a better initial rsqrte or an application that
>>>>>> can trade precision for execution time, we could offer a command
>>>>>> line option to do only 2 steps for doulbe and 1 step for float;
>>>>>> similar to -
>>>> mrecip-precision for PowerPC.
>>>>>> What are your thoughts on that?
>>>>>> 
>>>>>> Best regards,
>>>>>> Benedikt
>>> 

Reply via email to