b-sumner added a comment.

In https://reviews.llvm.org/D37822#877903, @Anastasia wrote:

> In https://reviews.llvm.org/D37822#877572, @yaxunl wrote:
>
> > In https://reviews.llvm.org/D37822#873876, @Anastasia wrote:
> >
> > > In https://reviews.llvm.org/D37822#872446, @yaxunl wrote:
> > >
> > > > In https://reviews.llvm.org/D37822#872291, @Anastasia wrote:
> > > >
> > > > > Could you please explain a bit more why the alignment have to be put 
> > > > > explicitly in the struct? I am just not very convinced this is 
> > > > > general enough.
> > > >
> > > >
> > > > The captured variables are fields of the block literal struct. Due to 
> > > > alignment requirement of these fields, there is alignment requirement of
> > > >  the block literal struct. The ISA of the block invoke function is 
> > > > generated with the assumption of these alignments. If the block literal 
> > > > is
> > > >  allocated at a memory address not satisfying the alignment 
> > > > requirement, the kernel behavior is undefined.
> > > >
> > > > Generally, __enqueue_kernel library function needs to prepare the 
> > > > kernel argument before launching the kernel. It usually does this by 
> > > > copying
> > > >  the block literal to some buffer then pass the address of the buffer 
> > > > to the kernel. Then the address of the buffer has to satisfy the 
> > > > alignment
> > > >  requirement.
> > > >
> > > > If this block literal struct is not general enough, how about add 
> > > > another field as target reserved size, and leave the remaining space of 
> > > > header for
> > > >  target specific use. And add a target hook to allow target fill the 
> > > > reserved space, e.g.
> > > >
> > > >   struct __opencl_block_literal {
> > > >     int total_size;
> > > >     int align;
> > > >     __generic void *invoke;
> > > >     int target_reserved_size; /* round up to 4 bytes */
> > > >     int target_reserved[];
> > > >     /* captures */
> > > >   };
> > > >
> > >
> > >
> > > I like the idea of the target reserved part actually. But not sure how it 
> > > could be used without adding any target specific methods?
> >
> >
> > If we decide to add target reserved fields, I can add target hooks to fill 
> > these fields. However I would suggest to leave this for future since I 
> > don't see there is need for other fields for now.
>
>
> I could imagine it can be usefull for some vendor implementations.
>
> >> However, I am still not clear why the alignment of this struct has to be 
> >> different from any other struct Clang produces. Normally the alignment of 
> >> objects have to be known during IR generation to put them correctly in the 
> >> attributes of generated alloca, store and loads. But as a field inside 
> >> struct I don't know how it can be useful. I would imagine `enqueue_kernel` 
> >> would just operate on the block as if it would be an arbitrary buffer of 
> >> data. Also would size of the struct not account for any padding to make 
> >> sure the alignment can be deduced based on it correctly?
> > 
> > enqueue_kernel needs to pass the block struct to the kernel. Let's assume 
> > it does this by copying the block struct to a buffer. If enqueue_kernel 
> > does not know the alignment of the struct, it can only put it at an 
> > arbitrary address in the buffer. Then the kernel has to copy the struct to 
> > an aligned private memory and load the fields. However, if the 
> > enqueued_kernel knows the alignment of the struct, it can put it at an 
> > address satisfying the alignment. Then the kernel can load the fields 
> > directly from the buffer, skips the step of copying to an aligned private 
> > memory. Therefore, alignment of the block struct is usually a useful 
> > information for enqueue_kernel. I think that's why in the SPIRV spec 
> > OpEnqueueKernel requires an alignment operand for the block context.
>
> Ok, I just think in C if you use `malloc` to obtain a pointer to some memory 
> location it doesn't take any alignment information. Then you can use the 
> pointer to copy any data including the struct into the location its pointed 
> to. And the pointer can be used later on correctly. I think the alignment is 
> deduced in this case from the type or the size of an object. Do you know 
> where the alignment information is used for SPIRV call? Also how is the block 
> represented in SPIRV?


Actually malloc alignment is not sufficient more many uses such as CPU 
supported vectors, e.g. AVX512 or passed to create buffer with 
use-host-pointer.  In such cases you need posix_memalign or some similar API.  
Having the alignment means it is available if needed.  If an implementation 
doesn't need it, there is no harm is there?


https://reviews.llvm.org/D37822



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to