Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

Nathan Sidwell Wed, 02 Dec 2015 06:14:56 -0800

On 12/02/15 08:46, Jakub Jelinek wrote:

Or does the OpenACC execution model not allow anything like that, i.e.
have some function with an automatic variable pass the address of that
variable to some other function and that other function use #acc loop kind
that expects the caller to be at the worker level and splits the work among
the threads in the warp, on the array section pointed by that passed in
pointer?  See the OpenMP testcase I've posted in this thread.


There are two cases to consider

1) the caller (& address taker) is already partitioned. Thus the callers'frames are already copied. The caller takes the address of the object in itsown frame.

An example would be calling say __mulcd3 where the return value location ispassed by pointer.

2) the caller is not partitioned and calls a function containing a partitionedloop. The caller takes the address of its instance of the variable. As part ofthe RTL expansion we have to convert addresses (to be stored in registers) tothe generic address space. That conversion creates a pointer that may be usedby any thread (on the same CTA)[*]. The function call is executed by allthreads (they're partially un-neutered before the call). In the partitionedloop, each thread ends up accessing the location in the frame of the originalcalling active thread.

[*] although .local is private to each thread, it's placed in memory that isreachable from anywhere, provided a generic address is used. Essentially it'slike TLS and genericization is simply adding the thread pointer to the localmemory offset to create a generic address.


nathan

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

Reply via email to