On 2019/10/8 10:05 PM, Thomas Schwinge wrote:
Hi Chung-Lin!

While we're all waiting for Tom to comment on this;-)  -- here's another
item I realized:

On 2019-09-10T19:41:59+0800, Chung-Lin Tang<chunglin_t...@mentor.com>  wrote:
The libgomp nvptx plugin changes are also quite contained, with lots of
now unneeded [...] code deleted (since we no longer first cuAlloc a
buffer for the argument record before cuLaunchKernel)
It would be nice;-)  -- but unless I'm confused, it's not that simple: we
either have to reject (force host-fallback execution) or keep supporting
"old-style" nvptx offloading code: new-libgomp has to continue to work
with nvptx offloading code once generated by old-GCC.  Possibly even a
mixture of old and new nvptx offloading code, if libraries are involved,
huh!

I have not completely thought that through, but I suppose this could be
addressed by adding a flag to the 'struct nvptx_fn' (or similar) that's
synthesized by nvptx 'mkoffload'?

Hi Thomas, Tom,
I've looked at the problem, it is unfortunate that we overlooked the
need for versioning of NVPTX images, and did not reserve something in
'struct nvptx_tdata' for something like this.

But how about something like:

typedef struct nvptx_tdata
{
  const struct targ_ptx_obj *ptx_objs;
  unsigned ptx_num;

  unsigned ptx_version;         /* <==== Add version field here.  */

  const char *const *var_names;
  unsigned var_num;

  const struct targ_fn_launch *fn_descs;
  unsigned fn_num;
} nvptx_tdata_t;

We currently only support x86_64 and powerpc64le hosts, which are both LP64 
targets.

Assuming that, the position above where I put the new 'ptx_version' field is 
already
a 32-bit sized alignment hole, doesn't change the layout of other fields, and 
in the
static 'target_data' variable generated by mkoffload should be zeroed in current
circulating binaries (unless binutils is not doing the intuitive thing...)

If these assumptions are safe, then we can treat as if ptx_version == 0 right 
now,
and from now on bump it to 1 for these new nvptx convention changes.

(We can do a similar thing in 'struct targ_fn_launch' if we want to 
differentiate
at a per-function level.)

Any considerations?

Thanks,
Chung-Lin

Reply via email to