================
@@ -568,32 +590,45 @@ void createRegisterFatbinFunction(Module &M, 
GlobalVariable *FatbinDesc,
 
 } // namespace
 
-Error wrapOpenMPBinaries(Module &M, ArrayRef<ArrayRef<char>> Images) {
-  GlobalVariable *Desc = createBinDesc(M, Images);
+Error OffloadWrapper::wrapOpenMPBinaries(
+    Module &M, ArrayRef<ArrayRef<char>> Images,
+    std::optional<EntryArrayTy> EntryArray) const {
+  GlobalVariable *Desc = createBinDesc(
+      M, Images,
+      EntryArray
+          ? *EntryArray
+          : offloading::getOffloadEntryArray(M, "omp_offloading_entries"),
----------------
fabianmcg wrote:

I see what you mean, first some broader context, this patch is also part of a 
patch series that will add GPU compilation for OMP operations in MLIR without 
the need for `flang` or `clang`, which is not currently possible. This series 
also enables to JIT OMP operations in MLIR. The goal of the series is to make 
OMP target functional in MLIR as a standalone.

I allow the passage of a custom entry array because ORC JIT doesn't fully 
support `__start`, `__stop` symbols for grouping section data.  My solution was 
allowing the custom entry array, so in MLIR I build the full entry array and 
never rely on sections, this applies to OMP, CUDA and HIP.
Thus we have that the following MLIR:
```
module attributes {gpu.container_module} {
  gpu.binary @binary <#gpu.offload_embedding<cuda>> [#gpu.object<#nvvm.target, 
bin = "BLOB">]
  llvm.func @func() {
    %1 = llvm.mlir.constant(1 : index) : i64
    gpu.launch_func  @binary::@hello blocks in (%1, %1, %1) threads in (%1, %1, 
%1) : i64
    gpu.launch_func  @binary::@world blocks in (%1, %1, %1) threads in (%1, %1, 
%1) : i64
    llvm.return
  }
}
```
Produces:
```
@__begin_offload_binary = internal constant [2 x %struct.__tgt_offload_entry] 
[%struct.__tgt_offload_entry { ptr @binary_Khello, ptr 
@.omp_offloading.entry_name, i64 0, i32 0, i32 0 }, %struct.__tgt_offload_entry 
{ ptr @binary_Kworld, ptr @.omp_offloading.entry_name.2, i64 0, i32 0, i32 0 }]
@__end_offload_binary = internal constant ptr getelementptr inbounds 
(%struct.__tgt_offload_entry, ptr @__begin_offload_binary, i64 2)
@.fatbin_image.binary = internal constant [4 x i8] c"BLOB", section ".nv_fatbin"
@.fatbin_wrapper.binary = internal constant %fatbin_wrapper { i32 1180844977, 
i32 1, ptr @.fatbin_image.binary, ptr null }, section ".nvFatBinSegment", align 
8
@.cuda.binary_handle.binary = internal global ptr null
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr 
} { i32 1, ptr @.cuda.fatbin_reg.binary, ptr null }]
@binary_Khello = weak constant i8 0
@.omp_offloading.entry_name = internal unnamed_addr constant [6 x i8] 
c"hello\00"
@binary_Kworld = weak constant i8 0
@.omp_offloading.entry_name.2 = internal unnamed_addr constant [6 x i8] 
c"world\00"
...
```
And this works.

https://github.com/llvm/llvm-project/pull/78057
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to