On Wed, Jul 15, 2015 at 10:26 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > Hi, > > I tried to parallelize this fortran test-case (based on autopar/outer-1.c), > specifically the outer loop of the first loop nest using > -ftree-parallelize-loops=2: > ... > program main > implicit none > integer, parameter :: n = 500 > integer, dimension (0:n-1, 0:n-1) :: x > integer :: i, j, ii, jj > > > do ii = 0, n - 1 > do jj = 0, n - 1 > x(jj, ii) = ii + jj + 3 > end do > end do > > do i = 0, n - 1 > do j = 0, n - 1 > if (x(j, i) .ne. i + j + 3) call abort > end do > end do > > end program main > ... > > But autopar fails to parallelize due to failing dependency analysis. > > I then tried to add -floop-parallelize-all, and found that the graphite > dependency analysis did manage to decide that the iterations are > independent. > > At https://gcc.gnu.org/wiki/Graphite/Parallelization I read: > ... > In GCC there already exists an auto-parallelization pass (tree-parloops.c), > which is base on the lambda framework originally developed by Sebastian. > Since Lambda framework is limited to some cases (e.g. triangle loops, loops > with 'if' conditions), Graphite was developed to handle the loops that > lambda was not able to handle . > ... > > So I wondered, why not always use the graphite dependency analysis in > parloops. (Of course you could use -floop-parallelize-all, but that also > changes the heuristic). So I wrote a patch for parloops to use graphite > dependency analysis by default (so without -floop-parallelize-all), but > while testing found out that all the reduction test-cases started failing > because the modifications graphite makes to the code messes up the parloops > reduction analysis. > > Then I came up with this patch, which: > - first runs a parloops pass, restricted to reduction loops only, > - then runs graphite dependency analysis > - followed by a normal parloops pass run. > > This way, we get to both: > - compile the reduction testcases as before, and > - profit from the better graphite dependency analysis otherwise. > > A point worth noting is that I stopped running pass_iv_canon before parloops > (only in case of -ftree-parallelize-loops > 1) because running it before > graphite makes the graphite scop detection fail. > > Bootstrapped and reg-tested on x86_64. > > Any comments?
graphite dependence analysis is too slow to be enabled unconditionally. (read: hours in some simple cases - see bugzilla) Richard. > Thanks, > - Tom