saiislam added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:17143-17145
+  llvm::LoadInst *LD;
+  Constant *Offset, *Offset1;
+  Value *DP, *DP1;
----------------
arsenm wrote:
> Move down to define and initialize
There are multiple uses of the same identifier. Defining them four times looks 
odd.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:2550-2551
+    Error Err = retrieveAllMemoryPools();
+    if (Err)
+      return Plugin::error("Unable to retieve all memmory pools");
+
----------------
jhuber6 wrote:
> This and below isn't correct. You can't discard an `llvm::Error` value like 
> this without either doing `consumeError(std::move(Err))` or 
> `toString(std::move(Err))`. However, you don't need to consume these in the 
> first place, they already contain the error message from the callee and 
> should just be forwarded.
Removed the logic for preallocatedheap.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1752
+    // if (auto Err = preAllocateDeviceMemoryPool())
+    //   return Err;
+
----------------
jhuber6 wrote:
> saiislam wrote:
> > jhuber6 wrote:
> > > Leftoever?
> > No, it is not a left over.
> > One of the fields in cov5 implicitikernarg is heap_v1 ptr. It should point 
> > to a 128KB zero-initialized block of coarse-grained memory on each device 
> > before launching the kernel. This code was working a while ago, but right 
> > now it is failing most likely due to some latest change in devicertl memory 
> > handling mechanism.
> > I need to debug it with this patch, otherwise it will cause all target 
> > region code calling device-malloc to fail.
> > I will try to fix it before the next revision.
> Do we really need that? We only use a fraction of the existing implicit 
> arguments. My understanding is that most of these are more for runtime 
> handling for HIP and OpenCL while we would most likely want our own solution. 
> I'm assuming that the 128KB is not required for anything we use?
I have removed the preallocatedheap work from this patch.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h:36
 
-// The implicit arguments of AMDGPU kernels.
-struct AMDGPUImplicitArgsTy {
-  uint64_t OffsetX;
-  uint64_t OffsetY;
-  uint64_t OffsetZ;
-  uint64_t HostcallPtr;
-  uint64_t Unused0;
-  uint64_t Unused1;
-  uint64_t Unused2;
+enum IMPLICITARGS : uint32_t {
+  COV4_SIZE = 56,
----------------
jhuber6 wrote:
> saiislam wrote:
> > jhuber6 wrote:
> > > arsenm wrote:
> > > > This is getting duplicated a few places, should it move to a support 
> > > > header?
> > > > 
> > > > I don't love the existing APIs for this, I think a struct definition 
> > > > makes more sense
> > > The other user here is my custom loader, @JonChesterfield has talked 
> > > about wanting a common HSA helper header for awhile now.
> > > 
> > > I agree that the struct definition is much better. Being able to simply 
> > > allocate this size and then zero fill it is much cleaner.
> > Defining a struct for whole 256 byte of implicitargs in cov5 was becoming a 
> > little difficult due to different sizes of various fields (2, 4, 6, 8, 48, 
> > 72 bytes) along with multiple reserved fields in between. It made sense for 
> > cov4 because it only had 7 fields of 8 bytes each, where we needed only 4th 
> > field in OpenMP runtime (for hostcall_buffer).
> > 
> > Offset based lookups like the following allows handling/exposing only 
> > required fields across generations of ABI.
> If we don't use it, just put it as `unused`. It's really hard to read as-is 
> and it makes it more difficult to just zero fill.
I have reduced the fields to bare minimum required for OpenMP.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139730/new/

https://reviews.llvm.org/D139730

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to