On Tue, Aug 25, 2015 at 11:22 PM, Abe <abe_skol...@yahoo.com> wrote: > Dear all, > > I have redone the SPEC2006 CPU FP tests again after adding "-march=native". > Unfortunately, the results are not > very good for the new if-converter. I believe this is the case because the > CPU in question [details below] "only" > has first-generation AVX, and, from what I`ve been told, at least AVX2 is > needed for scatter/gather and/or > masked loads/stores, and possibly even AVX512 [the 3rd generation].
masked loads/stores are available with original AVX already. As said repeatedly scatter / gather is completely irrelevant and will not help vectorizing if-conversion using scratch-pads. And if you have masked loads/stores available you don't need scratch-pads. > As I > have written before, in my opinion > the new converter would be better than the old one if enough time and effort > were to be spent on it, > especially the time and effort to make it not add unneeded indirections. I don't see how the new converter can be better for vectorization. As soon as you need to introduce a scratch-pad you are lost. > First, I will give the totals. Then, I`ll give the CPU details for better > understanding what "-march=native" > did [or at least should have done]. Then, I`ll give the per-subtest numbers > that Richard requested. It's interesting to see that only very few benchmarks care about store if-conversion and if-conversion in general (because I believe the new if-converter ends up disabling vectorization for all if-converted cases). Richard. > For concision, I will use "Richard`s check-in" to refer to the GCC I built > from Richard`s check-in dated July 10 2015 > with Git SHA "cb791e75379bc0c8b10bd13bcb24305c36fd504b" and "git-svn-id: > svn+ssh://gcc.gnu.org/svn/gcc/trunk@225652". > [my reason for rebasing the relevant Git check-out to that point: quoting > Richard`s check-in message: > "PR tree-optimization/66823 > * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix > inverted predicate."] > > All the compilations were done with "-Ofast". The results, all integers, > are the number of loops that were vectorized. > > Regards, > > Abe > > > > > > > > > > > Richard`s check-in > [i.e. *_old_* converter] > no if-conversion-specific flags > ------------------------------- > 8374 > > > Richard`s check-in > [i.e. *_old_* converter] > "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores" > ---------------------------------------------------------------- > 8374 > > > Richard`s check-in > [i.e. *_old_* converter] > both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores" > ----------------------------------------------------------------- > 8388 > > > ---- > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > no if-conversion-specific flags > ------------------------------------- > 8275 > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores" > ---------------------------------------------------------------- > 8275 > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores" > ----------------------------------------------------------------- > 8275 > > > > > > > > > > CPU [from "/proc/cpuinfo"] > -------------------------- > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 45 > model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz > stepping : 7 > microcode : 0x710 > cpu MHz : 2499.902 > cache size : 15360 KB > physical id : 0 > siblings : 12 > core id : 0 > cpu cores : 6 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 13 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx > pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology > nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx > est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt > tsc_deadline_timer aes xsave avx lahf_lm ida arat xsaveopt pln pts dtherm > tpr_shadow vnmi flexpriority ept vpid > bogomips : 4999.80 > clflush size : 64 > cache_alignment : 64 > address sizes : 46 bits physical, 48 bits virtual > power management: > > [similarly for the cores numbered 1...23] > > kernel: 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 > x86_64 x86_64 GNU/Linux > > > > > > > > > > > > Richard`s check-in > [i.e. *_old_* converter] > no if-conversion-specific flags > ------------------------------- > 410.bwaves: 13 > 416.gamess: 3837 > 433.milc: 7 > 434.zeusmp: 138 > 435.gromacs: 172 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2141 > 482.sphinx3: 58 > 998.specrand: 0 > > > Richard`s check-in > [i.e. *_old_* converter] > "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores" > ---------------------------------------------------------------- > 410.bwaves: 13 > 416.gamess: 3837 > 433.milc: 7 > 434.zeusmp: 138 > 435.gromacs: 172 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2141 > 482.sphinx3: 58 > 998.specrand: 0 > > > Richard`s check-in > [i.e. *_old_* converter] > both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores" > ----------------------------------------------------------------- > 410.bwaves: 13 > 416.gamess: 3850 > 433.milc: 7 > 434.zeusmp: 138 > 435.gromacs: 173 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2141 > 482.sphinx3: 58 > 998.specrand: 0 > > > ---- > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > no if-conversion-specific flags > ------------------------------------- > 410.bwaves: 13 > 416.gamess: 3804 > 433.milc: 7 > 434.zeusmp: 136 > 435.gromacs: 173 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2079 > 482.sphinx3: 55 > 998.specrand: 0 > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > "-ftree-loop-if-convert" but NOT "-ftree-loop-if-convert-stores" > ---------------------------------------------------------------- > 410.bwaves: 13 > 416.gamess: 3804 > 433.milc: 7 > 434.zeusmp: 136 > 435.gromacs: 173 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2079 > 482.sphinx3: 55 > 998.specrand: 0 > > > patched version of Richard`s check-in > [i.e. *_new_* converter] > both "-ftree-loop-if-convert" AND "-ftree-loop-if-convert-stores" > ----------------------------------------------------------------- > 410.bwaves: 13 > 416.gamess: 3804 > 433.milc: 7 > 434.zeusmp: 136 > 435.gromacs: 173 > 436.cactusADM: 261 > 437.leslie3d: 92 > 444.namd: 0 > 450.soplex: 1 > 454.calculix: 436 > 459.GemsFDTD: 275 > 465.tonto: 943 > 470.lbm: 0 > 481.wrf: 2079 > 482.sphinx3: 55 > 998.specrand: 0 > > >