On 06/22/2015 04:24 PM, Jakub Jelinek wrote:
I don't understand why lowering the way you suggest helps here at all.
In the proposed scheme, you essentially have whole function
in e.g. worker-single or vector-single mode, which you need to be able to
handle properly in any case, because users can write such routines
themselves.  And then you can have a loop in such a function that
has some special attribute, a hint that it is desirable to vectorize it
(for PTX the PTX way) or use vector-single mode for it in a worker-single
function.

You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the program can be transformed into a parallel form. If we lose them early we must then rely on our analysis which may not be strong enough to prove that the loop can be parallelized. If we make these transformations early enough, while we still have the OpenACC directives, we can guarantee that we do exactly what the programmer specified.


Bernd

Reply via email to