On Tue, Jul 15, 2014 at 05:20:31PM -0400, Michael Meissner wrote:
> I did some timing tests to compare the new PowerPC IEEE 128-bit results to the
> current implementation of long double using the IBM extended format.
> 
> The test consisted a short loop doing the operation over arrays of 1,024
> elements, reading in two values, doing the operation, and then storing it 
> back.
> This loop in turn was done multiple times, with the idea that most of the
> values would be in the cache, and we didn't have to worry about pre-fetching,
> etc.
> 
> The float, double tests were done with vectorization disabled, while the 
> vector
> float and vector double tests, the compiler was allowed to do the normal auto
> vectorization.
> 
> The number reported was how much longer the second column took over the first:

I assume you mean the other way around?

> Generally, the __float128 is 2x slower than the current IBM extended double
> format, except for divide, where it is 5x slower.  I must say, the software
> floating point emulation routines worked well, and once the proper macros were
> setup, I only needed to override the type used for IEEE 128-bit.
> 
> Add loop
> ========
> 
> float       vs double:          2.00x

Why is float twice as slow as double?


Segher

Reply via email to