On 12/10/2022 15:29, Tobias Burnus wrote:
On 29.09.22 18:24, Andrew Stubbs wrote:
On 27/09/2022 14:16, Tobias Burnus wrote:
Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance regarding how to actually implement it.
[...]
The point is that you can use the "msg" and "text" fields for whatever
data you want, as long as you invent a new value for "type".
[....]
You can make "case 4" do whatever you want. There are enough bytes for
4 pointers, and you could use multiple packets (although it's not safe
to assume they're contiguous or already arrived; maybe "case 4" for
part 1, "case 5" for part 2). It's possible to change this structure,
of course, but the target implementation is in newlib so versioning
becomes a problem.
I think – also looking at the Newlib write.c implementation - that the
data is contiguous: there is an atomic add, where instead of passing '1'
for a single slot, I could also add '2' for two slots.
Right, sorry, the buffer is circular, but the counter is linear. It
simplified reservation that way, but it does mean that there's a limit
to the number of times the buffer can cycle before the counter
saturates. (You'd need to stream out gigabytes of data to hit the limit
though.)
Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev,
it needs the generic parts of the sister nvptx patch.*
2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + 32.)
As target_ext is blocking, I decided to use a stack local variable for
the remaining arguments and pass it along. Alternatively, I could also
use 2 slots - and process them together. This would avoid one
device->host memory copy but would make console_output less clear.
PS: Currently, device stack variables are private and cannot be accessed
from the host; this will change in a separate patch. It not only affects
the "rest" part as used in this patch but also the actual arrays behind
addr, kinds, and sizes. And quite likely a lot of the map/firstprivate
variables passed to addr.
As num_devices() will return 0 or -1, this is for now a non-issue.
So, the patch, as is, is known to be non-functional? How can you have
tested it? For the addrs_sizes_kind data to be accessible the asm("s8")
has to be wrong.
I think the patch looks good, in principle. The use of the existing
ring-buffer is the right way to do it, IMO.
Can we get the manually allocated stacks patch in first and then follow
up with these patches when they actually work?
Andrew