On Tue, Aug 25, 2015 at 11:22 PM, Abe <abe_skol...@yahoo.com> wrote:
> Dear all,
>
> I have redone the SPEC2006 CPU FP tests again after adding "-march=native".
> Unfortunately, the results are not
> very good for the new if-converter.  I believe this is the case because the
> CPU in question [details below] "only"
> has first-generation AVX, and, from what I`ve been told, at least AVX2 is
> needed for scatter/gather and/or
> masked loads/stores, and possibly even AVX512 [the 3rd generation].

masked loads/stores are available with original AVX already.  As said repeatedly
scatter / gather is completely irrelevant and will not help
vectorizing if-conversion
using scratch-pads.  And if you have masked loads/stores available you don't
need scratch-pads.

>  As I
> have written before, in my opinion
> the new converter would be better than the old one if enough time and effort
> were to be spent on it,
> especially the time and effort to make it not add unneeded indirections.

I don't see how the new converter can be better for vectorization.  As soon
as you need to introduce a scratch-pad you are lost.

> First, I will give the totals.  Then, I`ll give the CPU details for better
> understanding what "-march=native"
> did [or at least should have done].  Then, I`ll give the per-subtest numbers
> that Richard requested.

It's interesting to see that only very few benchmarks care about store
if-conversion and if-conversion in general (because I believe the new
if-converter
ends up disabling vectorization for all if-converted cases).

Richard.

> For concision, I will use "Richard`s check-in" to refer to the GCC I built
> from Richard`s check-in dated July 10 2015
> with Git SHA "cb791e75379bc0c8b10bd13bcb24305c36fd504b" and "git-svn-id:
> svn+ssh://gcc.gnu.org/svn/gcc/trunk@225652".
> [my reason for rebasing the relevant Git check-out to that point: quoting
> Richard`s check-in message:
>   "PR tree-optimization/66823
>      * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix
> inverted predicate."]
>
> All the compilations were done with "-Ofast".  The results, all integers,
> are the number of loops that were vectorized.
>
> Regards,
>
> Abe
>
>
>
>
>
>
>
>
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> no if-conversion-specific flags
> -------------------------------
> 8374
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 8374
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 8388
>
>
> ----
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> no if-conversion-specific flags
> -------------------------------------
> 8275
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 8275
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 8275
>
>
>
>
>
>
>
>
>
> CPU [from "/proc/cpuinfo"]
> --------------------------
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 45
> model name      : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
> stepping        : 7
> microcode       : 0x710
> cpu MHz         : 2499.902
> cache size      : 15360 KB
> physical id     : 0
> siblings        : 12
> core id         : 0
> cpu cores       : 6
> apicid          : 0
> initial apicid  : 0
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 13
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt
> tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dtherm
> tpr_shadow vnmi flexpriority ept vpid
> bogomips        : 4999.80
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 46 bits physical, 48 bits virtual
> power management:
>
> [similarly for the cores numbered 1...23]
>
> kernel: 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64
> x86_64 x86_64 GNU/Linux
>
>
>
>
>
>
>
>
>
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> no if-conversion-specific flags
> -------------------------------
> 410.bwaves: 13
> 416.gamess: 3837
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 172
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3837
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 172
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> Richard`s check-in
> [i.e. *_old_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3850
> 433.milc: 7
> 434.zeusmp: 138
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2141
> 482.sphinx3: 58
> 998.specrand: 0
>
>
> ----
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> no if-conversion-specific flags
> -------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores"
> ----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
> patched version of Richard`s check-in
> [i.e. *_new_* converter]
> both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores"
> -----------------------------------------------------------------
> 410.bwaves: 13
> 416.gamess: 3804
> 433.milc: 7
> 434.zeusmp: 136
> 435.gromacs: 173
> 436.cactusADM: 261
> 437.leslie3d: 92
> 444.namd: 0
> 450.soplex: 1
> 454.calculix: 436
> 459.GemsFDTD: 275
> 465.tonto: 943
> 470.lbm: 0
> 481.wrf: 2079
> 482.sphinx3: 55
> 998.specrand: 0
>
>
>

Reply via email to