On Wed, Jun 7, 2017 at 9:33 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <l...@redhat.com> wrote:
>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>> Hi,
>>>> This patch enables -ftree-loop-distribution by default at -O3 and above 
>>>> optimization levels.
>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>
>>>> Note I don't have strong opinion here and am fine with either it's 
>>>> accepted or rejected.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-05-31  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>       for -O3 and above levels.
>>> I think the question is how does this generally impact the performance
>>> of the generated code and to a lesser degree compile-time.
>>>
>>> Do you have any performance data?
>> Hi Jeff,
>> At this stage of the patch, only hmmer is impacted and improved
>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>> term, loop distribution is also one prerequisite transformation to
>> handle bwaves (at least).  For these two impacted cases, it helps to
>> resolve the gap against ICC.  I didn't check compilation time slow
>> down, we can restrict it to problem with small partition number if
>> that's a problem.
>
> The source of extra compile-time will be dependence checking which
> is quadratic, there is currently no limit in place on (# writes * (#
> reads + # writes))
> but one could easily be added.
Ah yes, the patch moves dependence computation before partition
construction now.  More likely this is the bottleneck now.

>
> Note that I recently added -fopt-info support for loop distribution so
> it should be
> possible to get an idea how many loops in SPEC are distributed and if small,
> double-check them.
During development, quite a lot loops get distributed.  I checked some
of them and restricted the pass to not distribute cases with no good.
But I didn't check with the final version patch.
>
> The cost model at this point is very conservative but due to
> implementation details
> distributing a loop can cause quite some arithmetic to be duplicated like for
>
> int a[1024], b[1204];
>
> void foo()
> {
>   for (int i = 0; i < 1024; ++i)
>     {
>        a[i] = i * i * i ... * i;
>        b[i] = a[i];
>     }
> }
>
> it will distribute to two loops both computing i * i * i .... rather than
> reading from a[i] in the second loop.
Hmm, this patch no longer distributes this case.  I think it is more
conservative than the original model, for example, the ldist tests
changed are now not distributed because there is no good to do it.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> jeff
>>>

Reply via email to