Daniel Berlin wrote: > Thus, it is algorithmically unsound in it's current form :) > Again, this is *only* the piece that attempts to convert loops > to perfect nests. > This is in fact, not terribly surprising, since the algorithm used was the > result of > Sebastian and I sitting at my whiteboard for 30 minutes trying to figure out > what we'd need to do to make swim happy :). >
:) > At one point, the only thing it would interchange was incredibly simple loops, > which worked out okay, and some simple perfect nests (which is where swim > falls into). > Everything was more or less happy. > > As our optimizers got better at cleaning up code, it seems to have started > to transform more and more perfect nests that violate the safety conditions, > but it doesn't know about it because the analysis necessary to see this is > just not there. > > It could be made sound by implementing these analysis, but doing so is > just as much work as implementing something that isn't a complete hack, > like loop distribution. At that point, you may as well just do that (and in > fact, the > analysis you need to implement is the basis of one of the common loop > distribution algorithms). > I think that the solution to these problems is to perform a more careful analysis of the data dependences before deciding to move the code out of the loop. I'll try to come back with a patch to the PR that still allows the interchange for the swim case. > [1] Note that before anyone asks, doing iteration space code generation on > loops that > are not perfect nests is possible, but is a much more computationally > expensive idea > Plus, in the general case, it requires generation of tons of guard statements, > which may make the transformed loop more expensive than the original. This > is a *hard* > problem, and actively researched. Right, some techniques are experimental, but other have been implemented and used in the open64/orc compiler at INRIA: Albert Cohen, Georges-Andre Silber and I will speak about the WRAP-IT experience at the GCC summit. The plan is to extend the lambda code to also capture other loop transforms that also need the notion of schedule: fusion, fission, tiling, etc. For this I'll open a new branch, and will not help for promptly solving our PRs. So let's solve the current PRs before creating new ones ;) Sebastian