On Wed, Jun 7, 2017 at 9:33 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <l...@redhat.com> wrote: >>> On 06/02/2017 05:52 AM, Bin Cheng wrote: >>>> Hi, >>>> This patch enables -ftree-loop-distribution by default at -O3 and above >>>> optimization levels. >>>> Bootstrap and test at O2/O3 on x86_64 and AArch64. is it OK? >>>> >>>> Note I don't have strong opinion here and am fine with either it's >>>> accepted or rejected. >>>> >>>> Thanks, >>>> bin >>>> 2017-05-31 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * opts.c (default_options_table): Enable OPT_ftree_loop_distribution >>>> for -O3 and above levels. >>> I think the question is how does this generally impact the performance >>> of the generated code and to a lesser degree compile-time. >>> >>> Do you have any performance data? >> Hi Jeff, >> At this stage of the patch, only hmmer is impacted and improved >> obviously in my local run of spec2006 for x86_64 and AArch64. In long >> term, loop distribution is also one prerequisite transformation to >> handle bwaves (at least). For these two impacted cases, it helps to >> resolve the gap against ICC. I didn't check compilation time slow >> down, we can restrict it to problem with small partition number if >> that's a problem. > > The source of extra compile-time will be dependence checking which > is quadratic, there is currently no limit in place on (# writes * (# > reads + # writes)) > but one could easily be added. Ah yes, the patch moves dependence computation before partition construction now. More likely this is the bottleneck now.
> > Note that I recently added -fopt-info support for loop distribution so > it should be > possible to get an idea how many loops in SPEC are distributed and if small, > double-check them. During development, quite a lot loops get distributed. I checked some of them and restricted the pass to not distribute cases with no good. But I didn't check with the final version patch. > > The cost model at this point is very conservative but due to > implementation details > distributing a loop can cause quite some arithmetic to be duplicated like for > > int a[1024], b[1204]; > > void foo() > { > for (int i = 0; i < 1024; ++i) > { > a[i] = i * i * i ... * i; > b[i] = a[i]; > } > } > > it will distribute to two loops both computing i * i * i .... rather than > reading from a[i] in the second loop. Hmm, this patch no longer distributes this case. I think it is more conservative than the original model, for example, the ldist tests changed are now not distributed because there is no good to do it. Thanks, bin > > Richard. > >> Thanks, >> bin >>> >>> jeff >>>