Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

Andrew Stubbs Thu, 29 Sep 2022 09:24:36 -0700

On 27/09/2022 14:16, Tobias Burnus wrote:

@@ -422,6 +428,12 @@ struct agent_info
      if it has been.  */
   bool initialized;

+ /* Flag whether the HSA program that consists of all the modules has been

+     finalized.  */
+  bool prog_finalized;
+  /* Flag whether the HSA OpenMP's requires_reverse_offload has been used.  */
+  bool has_reverse_offload;
+
   /* The instruction set architecture of the device. */
   gcn_isa device_isa;
   /* Name of the agent. */
@@ -456,9 +468,6 @@ struct agent_info
      thread should have locked agent->module_rwlock for reading before
      acquiring it.  */
   pthread_mutex_t prog_mutex;
-  /* Flag whether the HSA program that consists of all the modules has been
-     finalized.  */
-  bool prog_finalized;
   /* HSA executable - the finalized program that is used to locate kernels.  */
   hsa_executable_t executable;
 };


Why has prog_finalized been moved?

Andrew did suggest a while back to piggyback on the console_output handling,
avoiding another atomic access. - If this is still wanted, I like to have some
guidance regarding how to actually implement it.


The console output ring buffer has the following type:

   struct output {
     int return_value;
     unsigned int next_output;
     struct printf_data {
       int written;
       char msg[128];
       int type;
       union {
         int64_t ivalue;
         double dvalue;
         char text[128];
       };
     } queue[1024];
     unsigned int consumed;
   } output_data;

That is, for each entry in the buffer there is a 128-byte messagestring, an integer argument-type identifier, and a 128-byte argumentfield. Before we had printf we had functions that could printstring+int (gomp_print_integer, type==0), string+double(gomp_print_double, type==1) and string+string (gomp_print_string,type==2). The string conversion could then be done on the host to keepthe target code simple. These would still be useful functions if youwant to dump debug quickly without affecting performance so much, but Idon't think they ever got upstreamed because somebody (who should haveknown better!) created an unrelated function upstream with the same name(gomp_print_string) and we already had working printf by then so theeffort to fix it wasn't worth it.

The current printf implementation (actually the write syscall), usestype==3 to print 256-bytes of output, per packet, with no implied newline.

The point is that you can use the "msg" and "text" fields for whateverdata you want, as long as you invent a new value for "type".


The current loop has:

  switch (data->type)
    {
    case 0: printf ("%.128s%ld\n", data->msg, data->ivalue); break;
    case 1: printf ("%.128s%f\n", data->msg, data->dvalue); break;
    case 2: printf ("%.128s%.128s\n", data->msg, data->text); break;
    case 3: printf ("%.128s%.128s", data->msg, data->text); break;
    default: printf ("GCN print buffer error!\n"); break;
    }

You can make "case 4" do whatever you want. There are enough bytes for 4pointers, and you could use multiple packets (although it's not safe toassume they're contiguous or already arrived; maybe "case 4" for part 1,"case 5" for part 2). It's possible to change this structure, of course,but the target implementation is in newlib so versioning becomes a problem.

Reusing this would remove the need for has_reverse_offload, since theconsole output is scanned anyway, and also eliminate rev_ptr, rev_data,and means that, hypothetically, the device can queue up reverse offloadrequests asynchronously in the ring buffer (you'd need to ensuremulti-part packets don't get interleaved though).


Andrew

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

Reply via email to