On 10/23/2015 12:16 PM, Jakub Jelinek wrote:
On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
I'm not really familiar with OpenMP and what it allows, so take all my
comments with a grain of salt.
So
[snip - really good example]
Thanks!
So what I was trying to describe as a problem would be something along
the lines of
#pragma omp end declare target
int
main ()
{
int a = 4, b = 5, c = 6, d = 7;
#pragma omp target map(tofrom: a, c) firstprivate (b, d)
{
#pragma omp teams num_teams (6) thread_limit (33) shared(a, b)
{
#pragma omp parallel num_threads (24) shared (a, c, e)
{
int x[64], *xp = x;
#pragma omp simd private (h) safelen(32) simdlen(32)
for (int j = 0; j < 64; j++)
{
// if the assignment of xp was executed in lockstep by
// everything, then each thread stores into its own local
// array rather than the one owned by the controlling thread
xp[j] = j;
}
#pragma omp parallel num_threads (5)
}
}
return 0;
}
Thus, if .shared function local is allowed, we'd need to emit two copies of
foo, one which assumes it is run in the teams context and one which assumes
it is run in the parallel context.
Well, I suppose you could keep track of a second stack pointer manually,
or disallow recursion and just use a static block. Then you can put your
data anywhere you like. There isn't very much space in .shared though,
but the normal ptx stack isn't large either.
Bernd