I see no way to do this with affine scattering. We would need polynomial functions like Armin Grosslinger is doing, see http://www.infosun.fim.uni-passau.de/cl/publications/docs/MIP-0803.pdf but I think that's too slow at the moment...
On Fri, Dec 4, 2009 at 11:29 AM, Sebastian Pop <seb...@gmail.com> wrote: > On Fri, Dec 4, 2009 at 09:41, Toon Moene <t...@moene.org> wrote: >> I wonder if Graphite can do this one (or is planned to be able to): >> >> Another loop optimization that suggests itself >> is the following, trying to eliminate unnecessary >> loop constructs: >> \begin{verbatim} >> SUBROUTINE SUB(A, B, C, N, M) >> REAL A(N, M), B(N, M), C(N, M) >> DO J = 1, M >> DO I = 1, N >> C(I, J) = A(I, J) + B(I, J) >> ENDDO >> ENDDO >> END >> \end{verbatim} >> If we could generate code that looks like what is >> written below, we would have nothing but >> {\em one} loop. >> \begin{verbatim} >> DO K = 1, M*N >> C(K, 1) = A(K, 1) + B(K, 1) >> ENDDO >> \end{verbatim} >> In this case, this transformation can be done >> because the tuple $(i,j)$ forms an induction >> variable $i+n*(j-1)$ in its own right >> (replaced by $k$ in the {\em collapsed} loop). > > For the moment Graphite wouldn't do this kind of transform. But I think > that this could be done: CLooG should generate the following code if we > ask it to collapse the two loops: > > DO K = 1, M*N > I = K mod N > J = K / N > C(I, J) = A(I, J) + B(I, J) > ENDDO > > And then one would have to cleanup the scalar arithmetic to have > linearized array accesses, and that should already be done by GCC in > the lowering of memory accesses. > > Now a question for Cedric: how would you ask CLooG to generate this > collapsed loop? > > Thanks, > Sebastian >