On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
I've committed this patch to trunk. It implements a partitioning optimization for a loop partitioned over both vector and worker axes. We can elide the inner vector partitioning state propagation, if there are no intervening instructions in the worker-partitioned outer loop other than the forking and joining. We simply execute the worker propagation on all vectors.
Patch LGTM, although I wonder if you really need the extra option rather than just optimize.
I've been unable to introduce a testcase for this. The difficulty is we want to check an rtl dump from the acceleration compiler, and there doesn't appear to be existing machinery for that in the testsuite. Perhaps something to be added later?
What's the difficulty exactly? Getting a dump should be possible with -foffload=-fdump-whatever, does the testsuite have a problem finding the right filename?
Bernd