On Wed, Jun 7, 2017 at 10:49 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Jun 7, 2017 at 9:33 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <l...@redhat.com> wrote: >>>> On 06/02/2017 05:52 AM, Bin Cheng wrote: >>>>> Hi, >>>>> This patch enables -ftree-loop-distribution by default at -O3 and above >>>>> optimization levels. >>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64. is it OK? >>>>> >>>>> Note I don't have strong opinion here and am fine with either it's >>>>> accepted or rejected. >>>>> >>>>> Thanks, >>>>> bin >>>>> 2017-05-31 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * opts.c (default_options_table): Enable OPT_ftree_loop_distribution >>>>> for -O3 and above levels. >>>> I think the question is how does this generally impact the performance >>>> of the generated code and to a lesser degree compile-time. >>>> >>>> Do you have any performance data? >>> Hi Jeff, >>> At this stage of the patch, only hmmer is impacted and improved >>> obviously in my local run of spec2006 for x86_64 and AArch64. In long >>> term, loop distribution is also one prerequisite transformation to >>> handle bwaves (at least). For these two impacted cases, it helps to >>> resolve the gap against ICC. I didn't check compilation time slow >>> down, we can restrict it to problem with small partition number if >>> that's a problem. >> >> The source of extra compile-time will be dependence checking which >> is quadratic, there is currently no limit in place on (# writes * (# >> reads + # writes)) >> but one could easily be added. > Ah yes, the patch moves dependence computation before partition > construction now. More likely this is the bottleneck now.
Ah, that's bad (didn't look at the patch yet). The idea of the current was that applying any cost based merging reduces the number of checks that need to be done. Do you absolutely need to perform dependence checking upfront? Richard. >> >> Note that I recently added -fopt-info support for loop distribution so >> it should be >> possible to get an idea how many loops in SPEC are distributed and if small, >> double-check them. > During development, quite a lot loops get distributed. I checked some > of them and restricted the pass to not distribute cases with no good. > But I didn't check with the final version patch. >> >> The cost model at this point is very conservative but due to >> implementation details >> distributing a loop can cause quite some arithmetic to be duplicated like for >> >> int a[1024], b[1204]; >> >> void foo() >> { >> for (int i = 0; i < 1024; ++i) >> { >> a[i] = i * i * i ... * i; >> b[i] = a[i]; >> } >> } >> >> it will distribute to two loops both computing i * i * i .... rather than >> reading from a[i] in the second loop. > Hmm, this patch no longer distributes this case. I think it is more > conservative than the original model, for example, the ldist tests > changed are now not distributed because there is no good to do it. > > Thanks, > bin >> >> Richard. >> >>> Thanks, >>> bin >>>> >>>> jeff >>>>