On 12/10/2022 15:29, Tobias Burnus wrote:
On 29.09.22 18:24, Andrew Stubbs wrote:
On 27/09/2022 14:16, Tobias Burnus wrote:
Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some
guidance regarding how to actually implement it.
[...]
The point is that you can use the "msg" and "text" fields for whatever data you want, as long as you invent a new value for "type".
[....]
You can make "case 4" do whatever you want. There are enough bytes for 4 pointers, and you could use multiple packets (although it's not safe to assume they're contiguous or already arrived; maybe "case 4" for part 1, "case 5" for part 2). It's possible to change this structure, of course, but the target implementation is in newlib so versioning becomes a problem.

I think  – also looking at the Newlib write.c implementation - that the data is contiguous: there is an atomic add, where instead of passing '1' for a single slot, I could also add '2' for two slots.

Right, sorry, the buffer is circular, but the counter is linear. It simplified reservation that way, but it does mean that there's a limit to the number of times the buffer can cycle before the counter saturates. (You'd need to stream out gigabytes of data to hit the limit though.)

Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev, it needs the generic parts of the sister nvptx patch.*

2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 + 32.) As target_ext is blocking, I decided to use a stack local variable for the remaining arguments and pass it along. Alternatively, I could also use 2 slots - and process them together. This would avoid one device->host memory copy but would make console_output less clear.

PS: Currently, device stack variables are private and cannot be accessed from the host; this will change in a separate patch. It not only affects the "rest" part as used in this patch but also the actual arrays behind addr, kinds, and sizes. And quite likely a lot of the map/firstprivate variables passed to addr.

As num_devices() will return 0 or -1, this is for now a non-issue.

So, the patch, as is, is known to be non-functional? How can you have tested it? For the addrs_sizes_kind data to be accessible the asm("s8") has to be wrong.

I think the patch looks good, in principle. The use of the existing ring-buffer is the right way to do it, IMO.

Can we get the manually allocated stacks patch in first and then follow up with these patches when they actually work?

Andrew

Reply via email to