Anastasia added a comment.

In https://reviews.llvm.org/D37822#872446, @yaxunl wrote:

> In https://reviews.llvm.org/D37822#872291, @Anastasia wrote:
>
> > Could you please explain a bit more why the alignment have to be put 
> > explicitly in the struct? I am just not very convinced this is general 
> > enough.
>
>
> The captured variables are fields of the block literal struct. Due to 
> alignment requirement of these fields, there is alignment requirement of
>  the block literal struct. The ISA of the block invoke function is generated 
> with the assumption of these alignments. If the block literal is
>  allocated at a memory address not satisfying the alignment requirement, the 
> kernel behavior is undefined.
>
> Generally, __enqueue_kernel library function needs to prepare the kernel 
> argument before launching the kernel. It usually does this by copying
>  the block literal to some buffer then pass the address of the buffer to the 
> kernel. Then the address of the buffer has to satisfy the alignment
>  requirement.
>
> If this block literal struct is not general enough, how about add another 
> field as target reserved size, and leave the remaining space of header for
>  target specific use. And add a target hook to allow target fill the reserved 
> space, e.g.
>
>   struct __opencl_block_literal {
>     int total_size;
>     int align;
>     __generic void *invoke;
>     int target_reserved_size; /* round up to 4 bytes */
>     int target_reserved[];
>     /* captures */
>   };
>


I like the idea of the target reserved part actually. But not sure how it could 
be used without adding any target specific methods?

However, I am still not clear why the alignment of this struct has to be 
different from any other struct Clang produces. Normally the alignment of 
objects have to be known during IR generation to put them correctly in the 
attributes of generated alloca, store and loads. But as a field inside struct I 
don't know how it can be useful. I would imagine `enqueue_kernel` would just 
operate on the block as if it would be an arbitrary buffer of data. Also would 
size of the struct not account for any padding to make sure the alignment can 
be deduced based on it correctly?



================
Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:108
+llvm::PointerType *CGOpenCLRuntime::getGenericVoidPointerType() {
+  return llvm::IntegerType::getInt8PtrTy(
+      CGM.getLLVMContext(),
----------------
Should we put an assert of LangOpts.OpenCL?


================
Comment at: test/CodeGen/blocks-opencl.cl:1
 // RUN: %clang_cc1 -O0 %s -ffake-address-space-map -emit-llvm -o - -fblocks 
-triple x86_64-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 -O0 %s -emit-llvm -o - -fblocks -triple amdgcn | FileCheck 
%s
----------------
Btw, do you think we need this test any more? And if yes, could this be moved 
to CodeGenOpenCL?


https://reviews.llvm.org/D37822



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to