yaxunl marked 2 inline comments as done.
yaxunl added a comment.

In D71221#1791802 <https://reviews.llvm.org/D71221#1791802>, @tra wrote:

> What's the use case for this flag?


If a kernel is launched with a block size greater than the default max block 
size, explicit launch bound is required.

Different projects have different block size usages.

If a project mostly uses block size 1024, it prefers to use 1024 as the default 
max block size to avoid adding explicit launch bounds to each kernel.

If a project mostly uses block size 256, it prefers to use 256 as the default 
max block size.

Another situation is that at the initial development stage people prefer a 
default max block size that works for all possible launching block sizes. Then 
they can just let the max block size be 1024 by using this option. Later on, 
they can add launch bounds and choose a different max block size.

On the other hand, we cannot simply let the default max block size be 1024 
since we have large sets of existing projects assuming default max block size 
be 256. Changing the default max block size to 1024 will cause perf degradation 
in the existing projects. Adding this options provides an option for backward 
compatibility in case we want to change the default max block size.



================
Comment at: clang/lib/CodeGen/TargetInfo.cpp:8067
+    unsigned MaxThreadsPerBlock =
+        IsHIPKernel ? M.getLangOpts().GPUMaxThreadsPerBlock : 256;
+    std::string AttrVal = std::string("1,") + llvm::utostr(MaxThreadsPerBlock);
----------------
tra wrote:
> The magic value of 256 should be defined as a constant or macro somewhere -- 
> you're using it in multiple places.
> Alternatively, always set LangOpts.GPUMaxThreadsPerBlock to something and 
> skip figuring out the default everywhere else.
For the default value of LangOpts.GPUMaxThreadsPerBlock, it tends to be target 
dependent. I am thinking probably should add 
TargetInfo.getDefaultMaxThreadsPerBlock() and use it to set the default value 
for LangOpts.GPUMaxThreadsPerBlock.


================
Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu:19
+
 __attribute__((amdgpu_flat_work_group_size(32, 64))) // expected-no-diagnostics
 __global__ void flat_work_group_size_32_64() {
----------------
tra wrote:
> Is this the attribute that `__launch_bounds__()` expands to in HIP?
> If __launch_bounds__ is a separate attribute, then, I guess, it should be 
> tested, too.
yes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71221/new/

https://reviews.llvm.org/D71221



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to