Hi,

On Thu, 20 Oct 2011, Uros Bizjak wrote:

> This patch builds on recent patch by Michael (that implemented 
> fine-grained control on -mrecip option) and with -ffast-math emits 
> reciprocal sequences with additional NR step for vectorized SFmode 
> division and vectorized sqrtf(x).

FWIW, I didn't yet come to do the same for cpu2006, but here are the two 
results of polyhedron (sandybridge, with baseflags "-Ofast -funroll-loops 
-fpeel-loops -march=corei7-avx -mveclibabi=svml -flto -fwhole-program", 
i.e. without increasing the inline limits, and linking against libimf and 
libsvml).  With the above flags:

  Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      4.68     4086864      6.16       2  0.0211
      aermod     68.22     5603956     13.40       5  0.1864
         air     10.46     4961134      3.78       5  0.2888
    capacita      3.74     4213850     19.24       3  0.0998
     channel      1.44     4808524      1.22       5  0.2898
       doduc     12.64     4288238     19.91       5  0.1128
     fatigue      4.47     4217301      3.71       5  0.0989
     gas_dyn      6.92     4211997      3.43       5  2.8640
      induct      7.44     4385543     10.33       5  0.2719
       linpk      1.28     4053798      5.88       2  0.0647
        mdbx      3.97     4114107      7.63       5  0.1365
          nf      4.89     4147809      7.90       2  0.0380
     protein     15.07     5049415     20.70       5  0.7615
      rnflow     11.89     4260434     16.05       5  0.1359
    test_fpu      8.11     4207868      3.69       5  0.6687
        tfft      0.99     4110713      0.84       5  0.3024

Geometric Mean Execution Time =       6.35 seconds

With the above flags plus "-mrecip=vec-sqrt,vec-div":

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
          ac      3.85     4086864      6.17       2  0.0227
      aermod     68.31     5603956     13.38       2  0.0019
         air     10.92     4961134      3.77       5  0.1367
    capacita      3.71     4213850     18.68       2  0.0391
     channel      1.41     4808524      1.22       5  0.3327
       doduc     12.66     4288238     19.93       5  0.2391
     fatigue      4.36     4217301      3.70       2  0.0567
     gas_dyn      6.91     4211997      2.31       2  0.0867
      induct      7.46     4385543     10.31       5  0.1201
       linpk      1.70     4053798      5.88       2  0.0383
        mdbx      3.98     4114107      7.68       5  0.4000
          nf      4.89     4147809      7.89       2  0.0348
     protein     14.00     5049415     20.51       2  0.0478
      rnflow     11.89     4260434     16.05       4  0.0837
    test_fpu      8.09     4207868      3.71       5  0.7097
        tfft      1.13     4110713      0.83       5  0.2290

Geometric Mean Execution Time =       6.18 seconds

I.e. gas_dyn improves quite a bit (as expected), and the rest still works.  
I know that cpu2006 also works, but as said have no recent measurements 
for that, which I'm going to take now.


Ciao,
Michael.

Reply via email to