On Thu, Nov 21, 2013 at 7:05 AM, Xinliang David Li <davi...@google.com> wrote:
> Would it be sufficient to
>
> 1) get rid of the 'may_increase_size' parameter' in all the unroll
> interfaces (basically make it true for O2); and
> 2) set MAX_COMPLETELY_PEELED_INSNS parameter to be a smaller value for
> O2? -- this makes O2 and O3's complete unroll behave in the same way
> but with different parameter. Note that doing so is very similar to
> loop vectorization at O2 -- O2 requires a cheap cost model which
> lowers the value for related parameter such as # of alias checks.  See
> how this is done in opts.c

I agree that yet another param is bad.

> David
>
> On Wed, Nov 20, 2013 at 7:41 PM, Sriraman Tallam <tmsri...@google.com> wrote:
>> Hi,
>>
>>     Currently, tree unrolling pass(cunroll) does not allow any code
>> size growth in O2 mode.  Code size growth is permitted only if O3 or
>> funroll-loops/fpeel-loops is used. I have created  a patch to allow
>> partial code size increase in O2 mode. With funroll-loops the maximum
>> allowed code growth is 100 unrolled insns. For partial growth, I
>> experimented with various values of code growth and I have attached
>> SPEC 2006 performance numbers for code growth from 20 to 100 insns in
>> steps of 20.
>>
>>    For this patch, I have set the partial code growth in O2 mode to be
>> 40 insns (tunable via param) where we get performance improvements
>> with minimal code size growth.  Perf. data shows good improvements in
>> a few benchmarks.  h264, sjeng and bzip2 get >2%  improvement.
>> calculix shows a big regression(4.5% on westmere) which I am
>> investigating along with the povray regression.

Did you look at compile-time effects?  Note that you should avoid
complete peeling here (unrolling based on max_iter) as well I think.
40 instructions is a lot to allow given the optimistic unrolling.

See PRs we have where even with the current code we unroll way
too much for -O2.

Richard.

>>    I also ran experiments with -ftree-vectorize turned on with -O2
>> both in baseline and with the partial unroll to study the effect of
>> unrolling on vectorization. Loop unrolling seems to benefit more
>> benchmarks when vectorization is turned on.
>>
>>    I have attached the patch and pdfs of the perf. data. and code size 
>> growth.
>>
>> How to read the attached perf data:
>>
>> There are two data files.
>>
>> * spec_perf_O2_unroll.txt contains perf data using unrolling with
>> various code size growth on O2.
>> * spec_perf_O2_vectorize_ unroll.txt contains perf data using
>> unrolling with various code size growth on O2 + ftree-vectorize.
>>
>> Each file contains perf. improvements and code size growth data.
>> Experiments were done on Ibis-sandybridge and Ikaria-westmere.
>>
>> Here is a sample from the file (All perf. numbers are in %):
>>
>> Unroll insns code growth           20      40     60       80        100
>> _____________________________________________________
>> spec/2006/fp/C++/444.namd     -3.2   -0.13   -0.4    -0.57      -0.31
>>
>> This data shows that namd regressed by 3.2% over baseline when code
>> size growth was set to 20 insns and regressed by 0.57% over baseline
>> when growth was 80 insns.
>>
>>    Please let me know what you think.
>>
>> Thanks
>> Sri

Reply via email to