Hi, On Thu, 20 Oct 2011, Uros Bizjak wrote:
> This patch builds on recent patch by Michael (that implemented > fine-grained control on -mrecip option) and with -ffast-math emits > reciprocal sequences with additional NR step for vectorized SFmode > division and vectorized sqrtf(x). FWIW, I didn't yet come to do the same for cpu2006, but here are the two results of polyhedron (sandybridge, with baseflags "-Ofast -funroll-loops -fpeel-loops -march=corei7-avx -mveclibabi=svml -flto -fwhole-program", i.e. without increasing the inline limits, and linking against libimf and libsvml). With the above flags: Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 4.68 4086864 6.16 2 0.0211 aermod 68.22 5603956 13.40 5 0.1864 air 10.46 4961134 3.78 5 0.2888 capacita 3.74 4213850 19.24 3 0.0998 channel 1.44 4808524 1.22 5 0.2898 doduc 12.64 4288238 19.91 5 0.1128 fatigue 4.47 4217301 3.71 5 0.0989 gas_dyn 6.92 4211997 3.43 5 2.8640 induct 7.44 4385543 10.33 5 0.2719 linpk 1.28 4053798 5.88 2 0.0647 mdbx 3.97 4114107 7.63 5 0.1365 nf 4.89 4147809 7.90 2 0.0380 protein 15.07 5049415 20.70 5 0.7615 rnflow 11.89 4260434 16.05 5 0.1359 test_fpu 8.11 4207868 3.69 5 0.6687 tfft 0.99 4110713 0.84 5 0.3024 Geometric Mean Execution Time = 6.35 seconds With the above flags plus "-mrecip=vec-sqrt,vec-div": Benchmark Compile Executable Ave Run Number Estim Name (secs) (bytes) (secs) Repeats Err % --------- ------- ---------- ------- ------- ------ ac 3.85 4086864 6.17 2 0.0227 aermod 68.31 5603956 13.38 2 0.0019 air 10.92 4961134 3.77 5 0.1367 capacita 3.71 4213850 18.68 2 0.0391 channel 1.41 4808524 1.22 5 0.3327 doduc 12.66 4288238 19.93 5 0.2391 fatigue 4.36 4217301 3.70 2 0.0567 gas_dyn 6.91 4211997 2.31 2 0.0867 induct 7.46 4385543 10.31 5 0.1201 linpk 1.70 4053798 5.88 2 0.0383 mdbx 3.98 4114107 7.68 5 0.4000 nf 4.89 4147809 7.89 2 0.0348 protein 14.00 5049415 20.51 2 0.0478 rnflow 11.89 4260434 16.05 4 0.0837 test_fpu 8.09 4207868 3.71 5 0.7097 tfft 1.13 4110713 0.83 5 0.2290 Geometric Mean Execution Time = 6.18 seconds I.e. gas_dyn improves quite a bit (as expected), and the rest still works. I know that cpu2006 also works, but as said have no recent measurements for that, which I'm going to take now. Ciao, Michael.