jhuber6 added a comment.

In D123471#3446751 <https://reviews.llvm.org/D123471#3446751>, @yaxunl wrote:

> HIP is considering a unified device binary embedding scheme with OpenMP. 
> However, some large MI frameworks are compiled with -fno-gpu-rdc. If 
> compiling with -fgpu-rdc, the linking time will significantly increase since 
> the post-linking optimizations take much longer time with the large linked 
> IR. Therefore, it would be desirable if the new OpenMP device binary 
> embedding scheme supports -fno-gpu-rdc mode.

This work should be very close to that, the new driver allows us to link 
everything together so OpenMP can call HIP / CUDA functions and vice-versa. I 
have done some preliminary tests with registering CUDA device variables with 
OpenMP, the only change required is to store these offloading sections at 
`omp_offloading_entries` and the OpenMP runtime will pick them up and try to 
register them. This method allows us to compile HIP / CUDA with OpenMP but 
since we're going to be registering two different images they'll have unique 
state. For full interoperability we'd need some way for make either HIP / CUDA 
or OpenMP "borrow" the other one's registered image so they can share the state.

> That said, I think this new scheme may work for -fno-gpu-rdc, probably with 
> some minor changes.

My understanding is that non-RDC builds do all the registration per-TU. Since 
that's the case then we should just be able to link them as we do now and they 
won't emit any device code that needs to be linked. So individual files could 
specify no-rdc and then they wouldn't be touched by the device linker run later.

> For -fno-gpu-rdc, each TU has its own device binary, so the device binaries 
> in the final image would be per GPU and per TU. That seems not a big problem 
> since they can be post-fixed with a unique ID for each TU.
>
> Different offload entries may have the same name in different TU's, therefore 
> an offload entry may not be uniquely identified by its name. To uniquely 
> identify an offload entry, it needs its name and the pointer to its belonging 
> device binary. Therefore, it would be desirable to have one extra field 
> 'owner':
>
>   Type struct __tgt_offload_entry {
>     void    *addr;      // Pointer to the offload entry info.
>                         // (function or global)
>     char    *name;      // Name of the function or global.
>     size_t  size;       // Size of the entry info (0 if it a function).
>     int32_t flags;
>     void  *owner; // pointer to the device binary containing this 
> offload-entry
>     int32_t reserved;
>   };
>
> It may be possible to use the `reserved` field for that purpose. However, it 
> is not sure if `reserved` will be used for some other purpose later.

For OpenMP we use an `exec_mode` global to control some kernel execution, 
there's a possibility we'd want to put it in the reserved field instead. We 
could add more fields to this, but it would break the ABI. We could work around 
that but it would be some additional complexity.

> Another choice is to let addr point to a struct which contains owner info. 
> However, that would introduce another level of indirection.

Yeah, I think for arbitrary extensions that would be the easiest way without 
breaking the ABI. We could use the reserved field to indicate if we have some 
"extension" there.

I think we're working through some similar stuff. I haven't worked much with 
HIP but I think there would be some benefit to bringing this all under the new 
driver I've been working on for OpenMP. Let me know if you want to collaborate 
on something for getting this to work with HIP.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123471/new/

https://reviews.llvm.org/D123471

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to