On Thu, Nov 21, 2013 at 7:05 AM, Xinliang David Li <davi...@google.com> wrote: > Would it be sufficient to > > 1) get rid of the 'may_increase_size' parameter' in all the unroll > interfaces (basically make it true for O2); and > 2) set MAX_COMPLETELY_PEELED_INSNS parameter to be a smaller value for > O2? -- this makes O2 and O3's complete unroll behave in the same way > but with different parameter. Note that doing so is very similar to > loop vectorization at O2 -- O2 requires a cheap cost model which > lowers the value for related parameter such as # of alias checks. See > how this is done in opts.c
I agree that yet another param is bad. > David > > On Wed, Nov 20, 2013 at 7:41 PM, Sriraman Tallam <tmsri...@google.com> wrote: >> Hi, >> >> Currently, tree unrolling pass(cunroll) does not allow any code >> size growth in O2 mode. Code size growth is permitted only if O3 or >> funroll-loops/fpeel-loops is used. I have created a patch to allow >> partial code size increase in O2 mode. With funroll-loops the maximum >> allowed code growth is 100 unrolled insns. For partial growth, I >> experimented with various values of code growth and I have attached >> SPEC 2006 performance numbers for code growth from 20 to 100 insns in >> steps of 20. >> >> For this patch, I have set the partial code growth in O2 mode to be >> 40 insns (tunable via param) where we get performance improvements >> with minimal code size growth. Perf. data shows good improvements in >> a few benchmarks. h264, sjeng and bzip2 get >2% improvement. >> calculix shows a big regression(4.5% on westmere) which I am >> investigating along with the povray regression. Did you look at compile-time effects? Note that you should avoid complete peeling here (unrolling based on max_iter) as well I think. 40 instructions is a lot to allow given the optimistic unrolling. See PRs we have where even with the current code we unroll way too much for -O2. Richard. >> I also ran experiments with -ftree-vectorize turned on with -O2 >> both in baseline and with the partial unroll to study the effect of >> unrolling on vectorization. Loop unrolling seems to benefit more >> benchmarks when vectorization is turned on. >> >> I have attached the patch and pdfs of the perf. data. and code size >> growth. >> >> How to read the attached perf data: >> >> There are two data files. >> >> * spec_perf_O2_unroll.txt contains perf data using unrolling with >> various code size growth on O2. >> * spec_perf_O2_vectorize_ unroll.txt contains perf data using >> unrolling with various code size growth on O2 + ftree-vectorize. >> >> Each file contains perf. improvements and code size growth data. >> Experiments were done on Ibis-sandybridge and Ikaria-westmere. >> >> Here is a sample from the file (All perf. numbers are in %): >> >> Unroll insns code growth 20 40 60 80 100 >> _____________________________________________________ >> spec/2006/fp/C++/444.namd -3.2 -0.13 -0.4 -0.57 -0.31 >> >> This data shows that namd regressed by 3.2% over baseline when code >> size growth was set to 20 insns and regressed by 0.57% over baseline >> when growth was 80 insns. >> >> Please let me know what you think. >> >> Thanks >> Sri