On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <l...@redhat.com> wrote: >> On 06/02/2017 05:52 AM, Bin Cheng wrote: >>> Hi, >>> This patch enables -ftree-loop-distribution by default at -O3 and above >>> optimization levels. >>> Bootstrap and test at O2/O3 on x86_64 and AArch64. is it OK? >>> >>> Note I don't have strong opinion here and am fine with either it's accepted >>> or rejected. >>> >>> Thanks, >>> bin >>> 2017-05-31 Bin Cheng <bin.ch...@arm.com> >>> >>> * opts.c (default_options_table): Enable OPT_ftree_loop_distribution >>> for -O3 and above levels. >> I think the question is how does this generally impact the performance >> of the generated code and to a lesser degree compile-time. >> >> Do you have any performance data? > Hi Jeff, > At this stage of the patch, only hmmer is impacted and improved > obviously in my local run of spec2006 for x86_64 and AArch64. In long > term, loop distribution is also one prerequisite transformation to > handle bwaves (at least). For these two impacted cases, it helps to > resolve the gap against ICC. I didn't check compilation time slow > down, we can restrict it to problem with small partition number if > that's a problem.
The source of extra compile-time will be dependence checking which is quadratic, there is currently no limit in place on (# writes * (# reads + # writes)) but one could easily be added. Note that I recently added -fopt-info support for loop distribution so it should be possible to get an idea how many loops in SPEC are distributed and if small, double-check them. The cost model at this point is very conservative but due to implementation details distributing a loop can cause quite some arithmetic to be duplicated like for int a[1024], b[1204]; void foo() { for (int i = 0; i < 1024; ++i) { a[i] = i * i * i ... * i; b[i] = a[i]; } } it will distribute to two loops both computing i * i * i .... rather than reading from a[i] in the second loop. Richard. > Thanks, > bin >> >> jeff >>