https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/88398
Summary: Currently we treat this attribute as a minimum number for the amount of blocks scheduled on the kernel. However, the doucmentation states that this applies to CTA's mapped onto a *single* SM. Currently we just set it to the total number of blocks, which will almost always result in a warning that the value is out of range and will be ignored. We don't have a good way to automatically know how many CTAs can be put on a single SM nor if we should do this, so we should probably leave this up to users manually adding it. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm >From b1229aeb26c67b73bf3af6449f7f4eb9d0ad2c6a Mon Sep 17 00:00:00 2001 From: Joseph Huber <hube...@outlook.com> Date: Thu, 11 Apr 2024 10:14:00 -0500 Subject: [PATCH] [OpenMP] Remove 'minncta' attributes from NVPTX kernels Summary: Currently we treat this attribute as a minimum number for the amount of blocks scheduled on the kernel. However, the doucmentation states that this applies to CTA's mapped onto a *single* SM. Currently we just set it to the total number of blocks, which will almost always result in a warning that the value is out of range and will be ignored. We don't have a good way to automatically know how many CTAs can be put on a single SM nor if we should do this, so we should probably leave this up to users manually adding it. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives-minnctapersm --- clang/test/OpenMP/ompx_attributes_codegen.cpp | 3 +-- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 4 +--- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/clang/test/OpenMP/ompx_attributes_codegen.cpp b/clang/test/OpenMP/ompx_attributes_codegen.cpp index 6735972c6b1070..87eb2913537ba5 100644 --- a/clang/test/OpenMP/ompx_attributes_codegen.cpp +++ b/clang/test/OpenMP/ompx_attributes_codegen.cpp @@ -36,6 +36,5 @@ void func() { // NVIDIA: "omp_target_thread_limit"="45" // NVIDIA: "omp_target_thread_limit"="17" // NVIDIA: !{ptr @__omp_offloading[[HASH1:.*]]_l16, !"maxntidx", i32 20} -// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"minctasm", i32 90} -// NVIDIA: !{ptr @__omp_offloading[[HASH2]]_l18, !"maxntidx", i32 45} +// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"maxntidx", i32 45} // NVIDIA: !{ptr @__omp_offloading[[HASH3:.*]]_l20, !"maxntidx", i32 17} diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 7fd8474c2ec890..4d2d352f7520b2 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -4786,11 +4786,9 @@ OpenMPIRBuilder::readTeamBoundsForKernel(const Triple &, Function &Kernel) { void OpenMPIRBuilder::writeTeamsForKernel(const Triple &T, Function &Kernel, int32_t LB, int32_t UB) { - if (T.isNVPTX()) { + if (T.isNVPTX()) if (UB > 0) updateNVPTXMetadata(Kernel, "maxclusterrank", UB, true); - updateNVPTXMetadata(Kernel, "minctasm", LB, false); - } if (T.isAMDGPU()) Kernel.addFnAttr("amdgpu-max-num-workgroups", llvm::utostr(LB) + ",1,1"); _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits