[lldb] [llvm] [lld] [clang-tools-extra] [clang] [flang] [libcxx] [libc] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-06 Thread David Stuttard via cfe-commits

https://github.com/dstutt closed https://github.com/llvm/llvm-project/pull/67104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-01 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/3] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[clang-tools-extra] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-01 Thread David Stuttard via cfe-commits

dstutt wrote:

Yes, this is still relevant (sorry, I had forgotten about it).
Just double checking that extra changes are not required after recent update to 
getPGMRSrc1.


https://github.com/llvm/llvm-project/pull/67104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-01 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[clang] [lldb] [libc] [flang] [lld] [clang-tools-extra] [libcxx] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-02 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[lld] [libc] [lldb] [flang] [clang] [libcxx] [llvm] [clang-tools-extra] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/4] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[lldb] [lld] [clang] [libcxx] [flang] [clang-tools-extra] [llvm] [libc] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/5] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits


@@ -1127,10 +1131,16 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
   const GCNSubtarget &ST = MF.getSubtarget();
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS, ST));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS, ST));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+EmitPALMetadataCommon(MD, CurrentProgramInfo, CallingConv::AMDGPU_CS,
+  *getGlobalSTI());

dstutt wrote:

Thanks - done.

https://github.com/llvm/llvm-project/pull/67104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits


@@ -1025,6 +1025,26 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(const 
MachineFunction &MF,
   OutStreamer->emitInt32(MFI->getNumSpilledVGPRs());
 }
 
+// Helper function to add common PAL Metadata 3.0+
+static void EmitPALMetadataCommon(AMDGPUPALMetadata *MD,
+  const SIProgramInfo &CurrentProgramInfo,
+  CallingConv::ID CC,
+  const MCSubtargetInfo &ST) {
+  MD->setHwStage(CC, ".ieee_mode", (bool)CurrentProgramInfo.IEEEMode);

dstutt wrote:

I can easily add that though - and that does mirror the recent change to 
getPGMRsrc1.

https://github.com/llvm/llvm-project/pull/67104
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[lld] [lldb] [libcxx] [clang] [libc] [clang-tools-extra] [flang] [llvm] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/6] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[clang-tools-extra] [clang] [lld] [flang] [libc] [libcxx] [llvm] [lldb] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread David Stuttard via cfe-commits

https://github.com/dstutt updated 
https://github.com/llvm/llvm-project/pull/67104

>From 259138920126f09149b488fc54e8d2a7da969ca4 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Thu, 24 Aug 2023 16:45:50 +0100
Subject: [PATCH 1/7] [AMDGPU] Add pal metadata 3.0 support to callable pal
 funcs

---
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   |  28 +-
 .../AMDGPU/pal-metadata-3.0-callable.ll   | 290 ++
 2 files changed, 314 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b2360ce30fd6e..22ecd3656d00a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -1098,10 +1098,30 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const 
MachineFunction &MF) {
   StringRef FnName = MF.getFunction().getName();
   MD->setFunctionScratchSize(FnName, MFI.getStackSize());
 
-  // Set compute registers
-  MD->setRsrc1(CallingConv::AMDGPU_CS,
-   CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
-  MD->setRsrc2(CallingConv::AMDGPU_CS, 
CurrentProgramInfo.getComputePGMRSrc2());
+  if (MD->getPALMajorVersion() < 3) {
+// Set compute registers
+MD->setRsrc1(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getPGMRSrc1(CallingConv::AMDGPU_CS));
+MD->setRsrc2(CallingConv::AMDGPU_CS,
+ CurrentProgramInfo.getComputePGMRSrc2());
+  } else {
+MD->setHwStage(CallingConv::AMDGPU_CS, ".ieee_mode",
+   (bool)CurrentProgramInfo.IEEEMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".wgp_mode",
+   (bool)CurrentProgramInfo.WgpMode);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".mem_ordered",
+   (bool)CurrentProgramInfo.MemOrdered);
+
+MD->setHwStage(CallingConv::AMDGPU_CS, ".trap_present",
+   (bool)CurrentProgramInfo.TrapHandlerEnable);
+MD->setHwStage(CallingConv::AMDGPU_CS, ".excp_en",
+   CurrentProgramInfo.EXCPEnable);
+
+const unsigned LdsDwGranularity = 128;
+MD->setHwStage(CallingConv::AMDGPU_CS, ".lds_size",
+   (unsigned)(CurrentProgramInfo.LdsSize * LdsDwGranularity *
+  sizeof(uint32_t)));
+  }
 
   // Set optional info
   MD->setFunctionLdsSize(FnName, CurrentProgramInfo.LDSSize);
diff --git a/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll 
b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
new file mode 100644
index 0..d4a5f61aced61
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/pal-metadata-3.0-callable.ll
@@ -0,0 +1,290 @@
+; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1100 -verify-machineinstrs < %s | 
FileCheck %s
+
+; CHECK:   .amdgpu_pal_metadata
+; CHECK-NEXT: ---
+; CHECK-NEXT: amdpal.pipelines:
+; CHECK-NEXT:  - .api:Vulkan
+; CHECK-NEXT:.compute_registers:
+; CHECK-NEXT:  .tg_size_en: true
+; CHECK-NEXT:  .tgid_x_en:  false
+; CHECK-NEXT:  .tgid_y_en:  false
+; CHECK-NEXT:  .tgid_z_en:  false
+; CHECK-NEXT:  .tidig_comp_cnt: 0x1
+; CHECK-NEXT:.hardware_stages:
+; CHECK-NEXT:  .cs:
+; CHECK-NEXT:.checksum_value: 0x9444d7d0
+; CHECK-NEXT:.debug_mode: 0
+; CHECK-NEXT:.excp_en:0
+; CHECK-NEXT:.float_mode: 0xc0
+; CHECK-NEXT:.ieee_mode:  true
+; CHECK-NEXT:.image_op:   false
+; CHECK-NEXT:.lds_size:   0x200
+; CHECK-NEXT:.mem_ordered:true
+; CHECK-NEXT:.sgpr_limit: 0x6a
+; CHECK-NEXT:.threadgroup_dimensions:
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:  - 0x400
+; CHECK-NEXT:  - 0x1
+; CHECK-NEXT:.trap_present:   false
+; CHECK-NEXT:.user_data_reg_map:
+; CHECK-NEXT:  - 0x1000
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT:  - 0x
+; CHECK-NEXT

[clang] 7940888 - [AMDGPU] Intrinsic to expose s_wait_event for export ready

2022-11-28 Thread David Stuttard via cfe-commits

Author: David Stuttard
Date: 2022-11-28T11:26:15Z
New Revision: 7940888c5987de2b5cbb4ec45b482df88e822f67

URL: 
https://github.com/llvm/llvm-project/commit/7940888c5987de2b5cbb4ec45b482df88e822f67
DIFF: 
https://github.com/llvm/llvm-project/commit/7940888c5987de2b5cbb4ec45b482df88e822f67.diff

LOG: [AMDGPU] Intrinsic to expose s_wait_event for export ready

Differential Revision: https://reviews.llvm.org/D138216

Added: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll

Modified: 
clang/include/clang/Basic/BuiltinsAMDGPU.def
clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
llvm/lib/Target/AMDGPU/SOPInstructions.td

Removed: 




diff  --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index d4d16d5a9563d..5e64f830fb850 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -261,6 +261,7 @@ TARGET_BUILTIN(__builtin_amdgcn_image_bvh_intersect_ray_lh, 
"V4UiWUifV4fV4hV4hV4
 
 // TODO: This is a no-op in wave32. Should the builtin require wavefrontsize64?
 TARGET_BUILTIN(__builtin_amdgcn_permlane64, "UiUi", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_s_wait_event_export_ready, "v", "n", 
"gfx11-insts")
 
 
//===--===//
 // WMMA builtins.

diff  --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
index a4f2d610afa83..59a16900fb1a4 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx11.cl
@@ -37,3 +37,9 @@ void test_ds_bvh_stack_rtn(global uint2* out, uint addr, uint 
data, uint4 data1)
 void test_permlane64(global uint* out, uint a) {
   *out = __builtin_amdgcn_permlane64(a);
 }
+
+// CHECK-LABEL: @test_s_wait_event_export_ready
+// CHECK: call void @llvm.amdgcn.s.wait.event.export.ready
+void test_s_wait_event_export_ready() {
+  __builtin_amdgcn_s_wait_event_export_ready();
+}

diff  --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 8f05eb10920c7..3e9233b1f86f9 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2067,6 +2067,10 @@ def int_amdgcn_wmma_bf16_16x16x16_bf16 : 
AMDGPUWmmaIntrinsicOPSEL;
 def int_amdgcn_wmma_i32_16x16x16_iu4   : AMDGPUWmmaIntrinsicIU;
 
+def int_amdgcn_s_wait_event_export_ready :
+  ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">,
+  Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects, IntrWillReturn]
+>;
 
 
//===--===//
 // Deep learning intrinsics.

diff  --git a/llvm/lib/Target/AMDGPU/SOPInstructions.td 
b/llvm/lib/Target/AMDGPU/SOPInstructions.td
index 674da1f0ae4a5..ce0b0dfc48ced 100644
--- a/llvm/lib/Target/AMDGPU/SOPInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SOPInstructions.td
@@ -1388,7 +1388,9 @@ let SubtargetPredicate = isGFX10Plus in {
 
 let SubtargetPredicate = isGFX11Plus in {
   def S_WAIT_EVENT : SOPP_Pseudo<"s_wait_event", (ins s16imm:$simm16),
- "$simm16">;
+ "$simm16"> {
+   let hasSideEffects = 1;
+ }
   def S_DELAY_ALU : SOPP_Pseudo<"s_delay_alu", (ins DELAY_FLAG:$simm16),
 "$simm16">;
 } // End SubtargetPredicate = isGFX11Plus
@@ -1430,6 +1432,10 @@ def : GCNPat<
   (S_SEXT_I32_I16 $src)
 >;
 
+def : GCNPat <
+  (int_amdgcn_s_wait_event_export_ready),
+(S_WAIT_EVENT (i16 0))
+>;
 
 
//===--===//
 // SOP2 Patterns

diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
new file mode 100644
index 0..3e95e4dec67a2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.wait.event.ll
@@ -0,0 +1,15 @@
+; RUN: llc -global-isel=0 -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < 
%s | FileCheck -check-prefix=GCN %s
+; RUN: llc -global-isel -march=amdgcn -verify-machineinstrs -mcpu=gfx1100 < %s 
| FileCheck -check-prefix=GCN %s
+
+; GCN-LABEL: {{^}}test_wait_event:
+; GCN: s_wait_event 0x0
+
+define amdgpu_ps void @test_wait_event() #0 {
+entry:
+  call void @llvm.amdgcn.s.wait.event.export.ready() #0
+  ret void
+}
+
+declare void @llvm.amdgcn.s.wait.event.export.ready() #0
+
+attributes #0 = { nounwind }



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Add intrinsics for v_cvt_pk_norm_{i16, u16}_f16 (PR #135631)

2025-04-15 Thread David Stuttard via cfe-commits

dstutt wrote:

Check code formatting job is failing in a weird way. I can't work out what the 
issue is.

https://github.com/llvm/llvm-project/pull/135631
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [LLVM] Add intrinsics for v_cvt_pk_norm_{i16, u16}_f16 (PR #135631)

2025-04-18 Thread David Stuttard via cfe-commits

dstutt wrote:

Looks like the undefs are causing some issues.
Presumably that's deliberate here? (So we should ignore the failure?)

https://github.com/llvm/llvm-project/pull/135631
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits