On Tue, Jul 15, 2014 at 05:20:31PM -0400, Michael Meissner wrote: > I did some timing tests to compare the new PowerPC IEEE 128-bit results to the > current implementation of long double using the IBM extended format. > > The test consisted a short loop doing the operation over arrays of 1,024 > elements, reading in two values, doing the operation, and then storing it > back. > This loop in turn was done multiple times, with the idea that most of the > values would be in the cache, and we didn't have to worry about pre-fetching, > etc. > > The float, double tests were done with vectorization disabled, while the > vector > float and vector double tests, the compiler was allowed to do the normal auto > vectorization. > > The number reported was how much longer the second column took over the first:
I assume you mean the other way around? > Generally, the __float128 is 2x slower than the current IBM extended double > format, except for divide, where it is 5x slower. I must say, the software > floating point emulation routines worked well, and once the proper macros were > setup, I only needed to override the type used for IEEE 128-bit. > > Add loop > ======== > > float vs double: 2.00x Why is float twice as slow as double? Segher