[lldb] [llvm] [lld] [clang-tools-extra] [clang] [flang] [libcxx] [libc] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt closed https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/3] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[clang-tools-extra] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
dstutt wrote: Yes, this is still relevant (sorry, I had forgotten about it). Just double checking that extra changes are not required after recent update to getPGMRSrc1. https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[clang] [lldb] [libc] [flang] [lld] [clang-tools-extra] [libcxx] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[lld] [libc] [lldb] [flang] [clang] [libcxx] [llvm] [clang-tools-extra] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[lldb] [lld] [clang] [libcxx] [flang] [clang-tools-extra] [llvm] [libc] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/5] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
@@ -1127,10 +1131,16 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { MD->setFunctionScratchSize(FnName, MFI.getStackSize()); const GCNSubtarget &ST = MF.getSubtarget(); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS, ST)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS, ST)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +EmitPALMetadataCommon(MD, CurrentProgramInfo, CallingConv::AMDGPU_CS, + *getGlobalSTI()); dstutt wrote: Thanks - done. https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
@@ -1025,6 +1025,26 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(const MachineFunction &MF, OutStreamer->emitInt32(MFI->getNumSpilledVGPRs()); } +// Helper function to add common PAL Metadata 3.0+ +static void EmitPALMetadataCommon(AMDGPUPALMetadata *MD, + const SIProgramInfo &CurrentProgramInfo, + CallingConv::ID CC, + const MCSubtargetInfo &ST) { + MD->setHwStage(CC, ".ieee_mode", (bool)CurrentProgramInfo.IEEEMode); dstutt wrote: I can easily add that though - and that does mirror the recent change to getPGMRsrc1. https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/6] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[clang-tools-extra] [clang] [lld] [flang] [libc] [libcxx] [llvm] [lldb] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)
https://github.com/dstutt updated https://github.com/llvm/llvm-project/pull/67104 >From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001 From: David Stuttard Date: Thu, 24 Aug 2023 16:45:50 +0100 Subject: [PATCH 1/7] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs --- llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 28 +- .../AMDGPU/pal-metadata-3.0-callable.ll | 290 ++ 2 files changed, 314 insertions(+), 4 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp index b2360ce30fd6e..22ecd3656d00a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { StringRef FnName = MF.getFunction().getName(); MD->setFunctionScratchSize(FnName, MFI.getStackSize()); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, - CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); - MD->setRsrc2(CallingConv::AMDGPU_CS, CurrentProgramInfo.getComputePGMRSrc2()); + if (MD->getPALMajorVersion() < 3) { +// Set compute registers +MD->setRsrc1(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS)); +MD->setRsrc2(CallingConv::AMDGPU_CS, + CurrentProgramInfo.getComputePGMRSrc2()); + } else { +MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode", + (bool)CurrentProgramInfo.IEEEMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode", + (bool)CurrentProgramInfo.WgpMode); +MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered", + (bool)CurrentProgramInfo.MemOrdered); + +MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present", + (bool)CurrentProgramInfo.TrapHandlerEnable); +MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en", + CurrentProgramInfo.EXCPEnable); + +const unsigned LdsDwGranularity = 128; +MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size", + (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity * + sizeof(uint32_t))); + } // Set optional info MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize); diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll new file mode 100644 index 0..d4a5f61aced61 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll @@ -0,0 +1,290 @@ +; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck %s + +; CHECK: .amdgpu_pal_metadata +; CHECK-NEXT: --- +; CHECK-NEXT: amdpal.pipelines: +; CHECK-NEXT: - .api:Vulkan +; CHECK-NEXT:.compute_registers: +; CHECK-NEXT: .tg_size_en: true +; CHECK-NEXT: .tgid_x_en: false +; CHECK-NEXT: .tgid_y_en: false +; CHECK-NEXT: .tgid_z_en: false +; CHECK-NEXT: .tidig_comp_cnt: 0x1 +; CHECK-NEXT:.hardware_stages: +; CHECK-NEXT: .cs: +; CHECK-NEXT:.checksum_value: 0x9444d7d0 +; CHECK-NEXT:.debug_mode: 0 +; CHECK-NEXT:.excp_en:0 +; CHECK-NEXT:.float_mode: 0xc0 +; CHECK-NEXT:.ieee_mode: true +; CHECK-NEXT:.image_op: false +; CHECK-NEXT:.lds_size: 0x200 +; CHECK-NEXT:.mem_ordered:true +; CHECK-NEXT:.sgpr_limit: 0x6a +; CHECK-NEXT:.threadgroup_dimensions: +; CHECK-NEXT: - 0x1 +; CHECK-NEXT: - 0x400 +; CHECK-NEXT: - 0x1 +; CHECK-NEXT:.trap_present: false +; CHECK-NEXT:.user_data_reg_map: +; CHECK-NEXT: - 0x1000 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0 +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT: - 0x +; CHECK-NEXT
[clang] 7940888 - [AMDGPU] Intrinsic to expose s_wait_event for export ready
Author: David Stuttard Date: 2022-11-28T11:26:15Z New Revision: 7940888c5987de2b5cbb4ec45b482df88e822f67 URL: https://github.com/llvm/llvm-project/commit/7940888c5987de2b5cbb4ec45b482df88e822f67 DIFF: https://github.com/llvm/llvm-project/commit/7940888c5987de2b5cbb4ec45b482df88e822f67.diff LOG: [AMDGPU] Intrinsic to expose s_wait_event for export ready Differential Revision: https://reviews.llvm.org/D138216 Added: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll Modified: clang/include/clang/Basic/BuiltinsAMDGPU.def clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl llvm/include/llvm/IR/IntrinsicsAMDGPU.td llvm/lib/Target/AMDGPU/SOPInstructions.td Removed: diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index d4d16d5a9563d..5e64f830fb850 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -261,6 +261,7 @@ TARGET_BUILTIN(__builtin_amdgcn_image_bvh_intersect_ray_lh, "V4UiWUifV4fV4hV4hV4 // TODO: This is a no-op in wave32. Should the builtin require wavefrontsize64? TARGET_BUILTIN(__builtin_amdgcn_permlane64, "UiUi", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_s_wait_event_export_ready, "v", "n", "gfx11-insts") //===--===// // WMMA builtins. diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl index a4f2d610afa83..59a16900fb1a4 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl @@ -37,3 +37,9 @@ void test_ds_bvh_stack_rtn(global uint2* out, uint addr, uint data, uint4 data1) void test_permlane64(global uint* out, uint a) { *out = __builtin_amdgcn_permlane64(a); } + +// CHECK-LABEL: @test_s_wait_event_export_ready +// CHECK: call void @llvm.amdgcn.s.wait.event.export.ready +void test_s_wait_event_export_ready() { + __builtin_amdgcn_s_wait_event_export_ready(); +} diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index 8f05eb10920c7..3e9233b1f86f9 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -2067,6 +2067,10 @@ def int_amdgcn_wmma_bf16_16x16x16_bf16 : AMDGPUWmmaIntrinsicOPSEL; def int_amdgcn_wmma_i32_16x16x16_iu4 : AMDGPUWmmaIntrinsicIU; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn] +>; //===--===// // Deep learning intrinsics. diff --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td b/llvm/lib/Target/AMDGPU/SOPInstructions.td index 674da1f0ae4a5..ce0b0dfc48ced 100644 --- a/llvm/lib/Target/AMDGPU/SOPInstructions.td +++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td @@ -1388,7 +1388,9 @@ let SubtargetPredicate = isGFX10Plus in { let SubtargetPredicate = isGFX11Plus in { def S_WAIT_EVENT : SOPP_Pseudo<"s_wait_event", (ins s16imm:$simm16), - "$simm16">; + "$simm16"> { + let hasSideEffects = 1; + } def S_DELAY_ALU : SOPP_Pseudo<"s_delay_alu", (ins DELAY_FLAG:$simm16), "$simm16">; } // End SubtargetPredicate = isGFX11Plus @@ -1430,6 +1432,10 @@ def : GCNPat< (S_SEXT_I32_I16 $src) >; +def : GCNPat < + (int_amdgcn_s_wait_event_export_ready), +(S_WAIT_EVENT (i16 0)) +>; //===--===// // SOP2 Patterns diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll new file mode 100644 index 0..3e95e4dec67a2 --- /dev/null +++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll @@ -0,0 +1,15 @@ +; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefix=GCN %s +; RUN: llc -global-isel -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s | FileCheck -check-prefix=GCN %s + +; GCN-LABEL: {{^}}test_wait_event: +; GCN: s_wait_event 0x0 + +define amdgpu_ps void @test_wait_event() #0 { +entry: + call void @llvm.amdgcn.s.wait.event.export.ready() #0 + ret void +} + +declare void @llvm.amdgcn.s.wait.event.export.ready() #0 + +attributes #0 = { nounwind } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Add intrinsics for v_cvt_pk_norm_{i16, u16}_f16 (PR #135631)
dstutt wrote: Check code formatting job is failing in a weird way. I can't work out what the issue is. https://github.com/llvm/llvm-project/pull/135631 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [LLVM] Add intrinsics for v_cvt_pk_norm_{i16, u16}_f16 (PR #135631)
dstutt wrote: Looks like the undefs are causing some issues. Presumably that's deliberate here? (So we should ignore the failure?) https://github.com/llvm/llvm-project/pull/135631 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits