Hi Sebastian,

On 05/26/2011 07:20 PM, Sebastian Pop wrote:
I see.  Would it be possible to strip mine the reduction loop that you say
not handled yet, and then translate to opencl the partial sums?

We believe it would be quite difficult. This problem can be divided into three 
sub-problems:
1. Allocating additional storage for privatizated variables.
2. Performing actual privatization in intermediate representation.
3. Selecting eligible loop nests based on set of heuristics.

For example, for the loop
for (i = 0; i < 10000; i ++)
        sum+= A[i];

This loop can be strip-mined to
for (i1 = 0; i1 < 10000; i1+= 100)
        for (i2 = i1; i2 < i1 + 100; i2++)
                sum+= A[i];

In this loop we can replace the outer loop by the kernel launch (the inner loop will be placed in the kernel's body), but we need an additional storage for the partial sums. Also, we are currently working on another project (OpenCL implementation for FPGA), so we can work on the graphite-opencl part-time only (mostly support and bugfixing).

--
Alexey Kravets
kayr...@ispras.ru

Reply via email to