Would it be sufficient to 1) get rid of the 'may_increase_size' parameter' in all the unroll interfaces (basically make it true for O2); and 2) set MAX_COMPLETELY_PEELED_INSNS parameter to be a smaller value for O2? -- this makes O2 and O3's complete unroll behave in the same way but with different parameter. Note that doing so is very similar to loop vectorization at O2 -- O2 requires a cheap cost model which lowers the value for related parameter such as # of alias checks. See how this is done in opts.c
David On Wed, Nov 20, 2013 at 7:41 PM, Sriraman Tallam <tmsri...@google.com> wrote: > Hi, > > Currently, tree unrolling pass(cunroll) does not allow any code > size growth in O2 mode. Code size growth is permitted only if O3 or > funroll-loops/fpeel-loops is used. I have created a patch to allow > partial code size increase in O2 mode. With funroll-loops the maximum > allowed code growth is 100 unrolled insns. For partial growth, I > experimented with various values of code growth and I have attached > SPEC 2006 performance numbers for code growth from 20 to 100 insns in > steps of 20. > > For this patch, I have set the partial code growth in O2 mode to be > 40 insns (tunable via param) where we get performance improvements > with minimal code size growth. Perf. data shows good improvements in > a few benchmarks. h264, sjeng and bzip2 get >2% improvement. > calculix shows a big regression(4.5% on westmere) which I am > investigating along with the povray regression. > > I also ran experiments with -ftree-vectorize turned on with -O2 > both in baseline and with the partial unroll to study the effect of > unrolling on vectorization. Loop unrolling seems to benefit more > benchmarks when vectorization is turned on. > > I have attached the patch and pdfs of the perf. data. and code size growth. > > How to read the attached perf data: > > There are two data files. > > * spec_perf_O2_unroll.txt contains perf data using unrolling with > various code size growth on O2. > * spec_perf_O2_vectorize_ unroll.txt contains perf data using > unrolling with various code size growth on O2 + ftree-vectorize. > > Each file contains perf. improvements and code size growth data. > Experiments were done on Ibis-sandybridge and Ikaria-westmere. > > Here is a sample from the file (All perf. numbers are in %): > > Unroll insns code growth 20 40 60 80 100 > _____________________________________________________ > spec/2006/fp/C++/444.namd -3.2 -0.13 -0.4 -0.57 -0.31 > > This data shows that namd regressed by 3.2% over baseline when code > size growth was set to 20 insns and regressed by 0.57% over baseline > when growth was 80 insns. > > Please let me know what you think. > > Thanks > Sri