llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-llvm-globalisel Author: None (macurtis-amd) <details> <summary>Changes</summary> Add clang builtins and associated llvm intrinsics for scoped load/store of 128bits New builtins: 1. `__builtin_amdgcn_global_load_b128` ([documentation](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/clang/docs/LanguageExtensions.rst#__builtin_amdgcn_global_load_b128-and-__builtin_amdgcn_global_store_b128), [test/examples](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl)) 2. `__builtin_amdgcn_global_store_b128` ([documentation](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/clang/docs/LanguageExtensions.rst#__builtin_amdgcn_global_load_b128-and-__builtin_amdgcn_global_store_b128), [test/examples](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl)) And corresponding intrinsics: 1. `llvm.amdgcn.global.load.b128` ([documentation](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/llvm/docs/AMDGPUUsage.rst) - search for intrinsic name, [test/examples](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.load.b128.ll) ) 2. `llvm.amdgcn.global.store.b128` ([documentation](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/llvm/docs/AMDGPUUsage.rst) - search for intrinsic name, [test/examples](https://github.com/macurtis-amd/llvm-project/blob/global-load-store-b128/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.load.b128.ll) ) These will initially be used by [RCCL](https://github.com/ROCm/rccl) to address some low-level performance issues. --- Patch is 1.74 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/172090.diff 23 Files Affected: - (modified) clang/docs/LanguageExtensions.rst (+37) - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+3) - (modified) clang/include/clang/Sema/SemaAMDGPU.h (+2) - (modified) clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp (+20) - (modified) clang/lib/Sema/SemaAMDGPU.cpp (+16) - (added) clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl (+113) - (added) clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl (+22) - (added) clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl (+26) - (modified) llvm/docs/AMDGPUUsage.rst (+106) - (modified) llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutor.h (+6) - (modified) llvm/include/llvm/CodeGen/GlobalISel/GIMatchTableExecutorImpl.h (+9) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+25) - (modified) llvm/lib/IR/Verifier.cpp (+30-3) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+3) - (modified) llvm/lib/Target/AMDGPU/FLATInstructions.td (+15) - (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+22) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.load.b128.ll (+30869) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.global.store.b128.ll (+3888) - (added) llvm/test/CodeGen/AMDGPU/unsupported-global-load.ll (+36) - (added) llvm/test/CodeGen/AMDGPU/unsupported-global-store.ll (+36) - (added) llvm/test/Verifier/amdgpu-intrinsics.ll (+66) - (modified) llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.cpp (+17) - (modified) llvm/utils/TableGen/Common/GlobalISel/GlobalISelMatchTable.h (+18) ``````````diff diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index c4b86b203d383..4d4d6ca3fe0bd 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -5243,6 +5243,43 @@ returns the bit at the position of the current lane. It is almost equivalent to ``(mask & (1 << lane_id)) != 0``, except that its behavior is only defined if the given mask has the same value for all active lanes of the current wave. + +__builtin_amdgcn_global_load_b128 and __builtin_amdgcn_global_store_b128 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Signature: + +.. code-block:: c + + typedef __attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int v4u; + typedef v4u __attribute__((address_space(1))) *global_ptr_to_v4u; + + v4u __builtin_amdgcn_global_load_b128( + v4u __attribute__((address_space(1))) *src, + const char *scope); + + void __builtin_amdgcn_global_store_b128( + v4u __attribute__((address_space(1))) *dst, + v4u data, + const char *scope); + +Load or store a vector of 4 unsigned integers from or to global memory with +cache behavior specified by `scope` which must be a string literal. + +Valid values for `scope` are: + +* ``"wavefront"`` +* ``"workgroup"`` +* ``"agent"`` +* ``""`` (empty string) + +These builtins are supported on gfx9, gfx10, gfx11, and gfx12 targets. + +They map to the llvm intrinsics ``llvm.amdgcn.global.load.b128`` and +``llvm.amdgcn.global.store.b128`` documented in `User Guide for AMDGPU Backend +<https://llvm.org/docs/AMDGPUUsage.html>`_. + + ARM/AArch64 Language Extensions ------------------------------- diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index a867144d83928..4bc5b1c16f2ad 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -190,6 +190,9 @@ TARGET_BUILTIN(__builtin_amdgcn_raw_ptr_buffer_atomic_fmax_f64, "ddQbiiIi", "", TARGET_BUILTIN(__builtin_amdgcn_raw_ptr_buffer_load_lds, "vQbv*3IUiiiIiIi", "", "vmem-to-lds-load-insts") TARGET_BUILTIN(__builtin_amdgcn_struct_ptr_buffer_load_lds, "vQbv*3IUiiiiIiIi", "", "vmem-to-lds-load-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_load_b128, "V4UiV4Ui*1cC*", "n", "gfx9-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_store_b128, "vV4Ui*1V4UicC*", "n", "gfx9-insts") + //===----------------------------------------------------------------------===// // Ballot builtins. //===----------------------------------------------------------------------===// diff --git a/clang/include/clang/Sema/SemaAMDGPU.h b/clang/include/clang/Sema/SemaAMDGPU.h index bac812a9d4fcf..556bfb705de67 100644 --- a/clang/include/clang/Sema/SemaAMDGPU.h +++ b/clang/include/clang/Sema/SemaAMDGPU.h @@ -28,6 +28,8 @@ class SemaAMDGPU : public SemaBase { bool checkCoopAtomicFunctionCall(CallExpr *TheCall, bool IsStore); + bool checkScopedMemAccessFunctionCall(CallExpr *TheCall); + bool checkMovDPPFunctionCall(CallExpr *TheCall, unsigned NumArgs, unsigned NumDataArgs); diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index eabdc370da6b4..384f76e092252 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -885,6 +885,26 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Function *F = CGM.getIntrinsic(IID, {Args[0]->getType()}); return Builder.CreateCall(F, {Args}); } + case AMDGPU::BI__builtin_amdgcn_global_load_b128: + case AMDGPU::BI__builtin_amdgcn_global_store_b128: { + const bool IsStore = + BuiltinID == AMDGPU::BI__builtin_amdgcn_global_store_b128; + LLVMContext &Ctx = CGM.getLLVMContext(); + SmallVector<Value *, 5> Args = {EmitScalarExpr(E->getArg(0))}; // addr + if (IsStore) + Args.push_back(EmitScalarExpr(E->getArg(1))); // data + const unsigned ScopeIdx = E->getNumArgs() - 1; + StringRef ScopeLit = + cast<StringLiteral>(E->getArg(ScopeIdx)->IgnoreParenCasts()) + ->getString(); + llvm::MDNode *MD = + llvm::MDNode::get(Ctx, {llvm::MDString::get(Ctx, ScopeLit)}); + Args.push_back(llvm::MetadataAsValue::get(Ctx, MD)); // scope + llvm::Function *F = + CGM.getIntrinsic(IsStore ? Intrinsic::amdgcn_global_store_b128 + : Intrinsic::amdgcn_global_load_b128); + return Builder.CreateCall(F, Args); + } case AMDGPU::BI__builtin_amdgcn_get_fpenv: { Function *F = CGM.getIntrinsic(Intrinsic::get_fpenv, {llvm::Type::getInt64Ty(getLLVMContext())}); diff --git a/clang/lib/Sema/SemaAMDGPU.cpp b/clang/lib/Sema/SemaAMDGPU.cpp index cece22092bb14..72c7bf03f93ad 100644 --- a/clang/lib/Sema/SemaAMDGPU.cpp +++ b/clang/lib/Sema/SemaAMDGPU.cpp @@ -255,6 +255,9 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID, (SemaRef.BuiltinConstantArg(TheCall, ArgCount, Result)) || (SemaRef.BuiltinConstantArg(TheCall, (ArgCount - 1), Result)); } + case AMDGPU::BI__builtin_amdgcn_global_load_b128: + case AMDGPU::BI__builtin_amdgcn_global_store_b128: + return checkScopedMemAccessFunctionCall(TheCall); default: return false; } @@ -344,6 +347,19 @@ bool SemaAMDGPU::checkCoopAtomicFunctionCall(CallExpr *TheCall, bool IsStore) { return Fail; } +bool SemaAMDGPU::checkScopedMemAccessFunctionCall(CallExpr *TheCall) { + bool Fail = false; + // Last argument is a string literal + Expr *Arg = TheCall->getArg(TheCall->getNumArgs() - 1); + auto Scope = dyn_cast<StringLiteral>(Arg->IgnoreParenCasts()); + if (!Scope) { + Fail = true; + Diag(TheCall->getBeginLoc(), diag::err_expr_not_string_literal) + << Arg->getSourceRange(); + } + return Fail; +} + bool SemaAMDGPU::checkMovDPPFunctionCall(CallExpr *TheCall, unsigned NumArgs, unsigned NumDataArgs) { assert(NumDataArgs <= 2); diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl new file mode 100644 index 0000000000000..7ffceead747e8 --- /dev/null +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-global-load-store.cl @@ -0,0 +1,113 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals smart +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx950 -emit-llvm -o - %s | FileCheck %s -check-prefixes=GFX,GFX950 +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx9-4-generic -emit-llvm -o - %s | FileCheck %s -check-prefixes=GFX,GFX9_4_GENERIC +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1250 -emit-llvm -o - %s | FileCheck %s -check-prefixes=GFX,GFX1250 +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx12-generic -emit-llvm -o - %s | FileCheck %s -check-prefixes=GFX,GFX12_GENERIC + +typedef __attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int v4u32; +typedef v4u32 __global *global_ptr_to_v4u32; + +//------------------------------------------------------------------------------ +// Store +//------------------------------------------------------------------------------ +// GFX-LABEL: @test_amdgcn_global_store_b128_00( +// GFX-NEXT: entry: +// GFX-NEXT: tail call void @llvm.amdgcn.global.store.b128(ptr addrspace(1) [[PTR:%.*]], <4 x i32> [[DATA:%.*]], metadata [[META4:![0-9]+]]) +// GFX-NEXT: ret void +// +void test_amdgcn_global_store_b128_00(global_ptr_to_v4u32 ptr, v4u32 data) { + __builtin_amdgcn_global_store_b128(ptr, data, "wavefront"); +} + +// GFX-LABEL: @test_amdgcn_global_store_b128_01( +// GFX-NEXT: entry: +// GFX-NEXT: tail call void @llvm.amdgcn.global.store.b128(ptr addrspace(1) [[PTR:%.*]], <4 x i32> [[DATA:%.*]], metadata [[META5:![0-9]+]]) +// GFX-NEXT: ret void +// +void test_amdgcn_global_store_b128_01(global_ptr_to_v4u32 ptr, v4u32 data) { + __builtin_amdgcn_global_store_b128(ptr, data, "workgroup"); +} + +// GFX-LABEL: @test_amdgcn_global_store_b128_10( +// GFX-NEXT: entry: +// GFX-NEXT: tail call void @llvm.amdgcn.global.store.b128(ptr addrspace(1) [[PTR:%.*]], <4 x i32> [[DATA:%.*]], metadata [[META6:![0-9]+]]) +// GFX-NEXT: ret void +// +void test_amdgcn_global_store_b128_10(global_ptr_to_v4u32 ptr, v4u32 data) { + __builtin_amdgcn_global_store_b128(ptr, data, "agent"); +} + +// GFX-LABEL: @test_amdgcn_global_store_b128_11( +// GFX-NEXT: entry: +// GFX-NEXT: tail call void @llvm.amdgcn.global.store.b128(ptr addrspace(1) [[PTR:%.*]], <4 x i32> [[DATA:%.*]], metadata [[META7:![0-9]+]]) +// GFX-NEXT: ret void +// +void test_amdgcn_global_store_b128_11(global_ptr_to_v4u32 ptr, v4u32 data) { + __builtin_amdgcn_global_store_b128(ptr, data, ""); +} + +//------------------------------------------------------------------------------ +// Load +//------------------------------------------------------------------------------ +// GFX-LABEL: @test_amdgcn_global_load_b128_00( +// GFX-NEXT: entry: +// GFX-NEXT: [[TMP0:%.*]] = tail call <4 x i32> @llvm.amdgcn.global.load.b128(ptr addrspace(1) [[PTR:%.*]], metadata [[META4]]) +// GFX-NEXT: ret <4 x i32> [[TMP0]] +// +v4u32 test_amdgcn_global_load_b128_00(global_ptr_to_v4u32 ptr) { + return __builtin_amdgcn_global_load_b128(ptr, "wavefront"); +} + +// GFX-LABEL: @test_amdgcn_global_load_b128_01( +// GFX-NEXT: entry: +// GFX-NEXT: [[TMP0:%.*]] = tail call <4 x i32> @llvm.amdgcn.global.load.b128(ptr addrspace(1) [[PTR:%.*]], metadata [[META5]]) +// GFX-NEXT: ret <4 x i32> [[TMP0]] +// +v4u32 test_amdgcn_global_load_b128_01(global_ptr_to_v4u32 ptr) { + return __builtin_amdgcn_global_load_b128(ptr, "workgroup"); +} + +// GFX-LABEL: @test_amdgcn_global_load_b128_10( +// GFX-NEXT: entry: +// GFX-NEXT: [[TMP0:%.*]] = tail call <4 x i32> @llvm.amdgcn.global.load.b128(ptr addrspace(1) [[PTR:%.*]], metadata [[META6]]) +// GFX-NEXT: ret <4 x i32> [[TMP0]] +// +v4u32 test_amdgcn_global_load_b128_10(global_ptr_to_v4u32 ptr) { + return __builtin_amdgcn_global_load_b128(ptr, "agent"); +} + +// GFX-LABEL: @test_amdgcn_global_load_b128_11( +// GFX-NEXT: entry: +// GFX-NEXT: [[TMP0:%.*]] = tail call <4 x i32> @llvm.amdgcn.global.load.b128(ptr addrspace(1) [[PTR:%.*]], metadata [[META7]]) +// GFX-NEXT: ret <4 x i32> [[TMP0]] +// +v4u32 test_amdgcn_global_load_b128_11(global_ptr_to_v4u32 ptr) { + return __builtin_amdgcn_global_load_b128(ptr, ""); +} +//. +// GFX950: [[META4]] = !{!"wavefront"} +// GFX950: [[META5]] = !{!"workgroup"} +// GFX950: [[META6]] = !{!"agent"} +// GFX950: [[META7]] = !{!""} +//. +// GFX9_4_GENERIC: [[META4]] = !{!"wavefront"} +// GFX9_4_GENERIC: [[META5]] = !{!"workgroup"} +// GFX9_4_GENERIC: [[META6]] = !{!"agent"} +// GFX9_4_GENERIC: [[META7]] = !{!""} +//. +// GFX1250: [[META4]] = !{!"wavefront"} +// GFX1250: [[META5]] = !{!"workgroup"} +// GFX1250: [[META6]] = !{!"agent"} +// GFX1250: [[META7]] = !{!""} +//. +// GFX12_GENERIC: [[META4]] = !{!"wavefront"} +// GFX12_GENERIC: [[META5]] = !{!"workgroup"} +// GFX12_GENERIC: [[META6]] = !{!"agent"} +// GFX12_GENERIC: [[META7]] = !{!""} +//. +//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: +// GFX1250: {{.*}} +// GFX12_GENERIC: {{.*}} +// GFX950: {{.*}} +// GFX9_4_GENERIC: {{.*}} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl b/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl new file mode 100644 index 0000000000000..b21b604baa944 --- /dev/null +++ b/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-error.cl @@ -0,0 +1,22 @@ +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx950 -S -verify -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx9-4-generic -S -verify -o - %s +// REQUIRES: amdgpu-registered-target + +typedef __attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int v4u32; +typedef v4u32 __global *global_ptr_to_v4u32; + +void test_amdgcn_global_store_b128_00(v4u32 *ptr, v4u32 data, const char* scope) { + __builtin_amdgcn_global_store_b128(ptr, data, ""); //expected-error{{passing '__private v4u32 *__private' to parameter of type '__attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int __global *' changes address space of pointer}} +} + +void test_amdgcn_global_store_b128_01(global_ptr_to_v4u32 ptr, v4u32 data, const char* scope) { + __builtin_amdgcn_global_store_b128(ptr, data, scope); //expected-error{{expression is not a string literal}} +} + +v4u32 test_amdgcn_global_load_b128_00(v4u32 *ptr, const char* scope) { + return __builtin_amdgcn_global_load_b128(ptr, ""); //expected-error{{passing '__private v4u32 *__private' to parameter of type '__attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int __global *' changes address space of pointer}} +} + +v4u32 test_amdgcn_global_load_b128_01(global_ptr_to_v4u32 ptr, const char* scope) { + return __builtin_amdgcn_global_load_b128(ptr, scope); //expected-error{{expression is not a string literal}} +} diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl b/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl new file mode 100644 index 0000000000000..ec357c58ef903 --- /dev/null +++ b/clang/test/SemaOpenCL/builtins-amdgcn-global-load-store-target-error.cl @@ -0,0 +1,26 @@ +// We test loads and stores separately because clang only seems to exit after +// the first 'target feature' error. + +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx602 -DTEST_LOAD -S -verify -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx705 -DTEST_LOAD -S -verify -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx810 -DTEST_LOAD -S -verify -o - %s + +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx602 -DTEST_STORE -S -verify -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx705 -DTEST_STORE -S -verify -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx810 -DTEST_STORE -S -verify -o - %s +// REQUIRES: amdgpu-registered-target + +typedef __attribute__((__vector_size__(4 * sizeof(unsigned int)))) unsigned int v4u32; +typedef v4u32 __global *global_ptr_to_v4u32; + +#ifdef TEST_LOAD +v4u32 test_amdgcn_global_load_b128_01(global_ptr_to_v4u32 ptr, const char* scope) { + return __builtin_amdgcn_global_load_b128(ptr, ""); // expected-error{{'__builtin_amdgcn_global_load_b128' needs target feature gfx9-insts}} +} +#endif + +#ifdef TEST_STORE +void test_amdgcn_global_store_b128_01(global_ptr_to_v4u32 ptr, v4u32 data, const char* scope) { + __builtin_amdgcn_global_store_b128(ptr, data, ""); // expected-error{{'__builtin_amdgcn_global_store_b128' needs target feature gfx9-insts}} +} +#endif diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 7ecf1c1124894..39afd29737156 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -1596,6 +1596,112 @@ The AMDGPU backend implements the following LLVM IR intrinsics. * 1 - Data cache. Instruction cache prefetches are unsafe on invalid address. + + llvm.amdgcn.global.load.b128 This intrinsic is supported on gfx9, gfx10, gfx11, and gfx12 targets. + + Signature: + + .. code-block:: llvm + + <4 x i32> @llvm.amdgcn.global.load.b128( + ptr addrspace(1), ; source + metadata) ; scope - e.g. '!0' where '!0 = !{!"wavegroup"}' + + Reads the value from the source address with cache behavior specified by the scope. + + The following table shows the mapping between valid scope values and target + instruction flags or field values. + + ============== ========================== ========================== ========================== ========================== ========================== + targets instruction ``"wavefront"`` ``"workgroup"`` ``"agent"`` ``""`` (empty string) + ============== ========================== ========================== ========================== ========================== ========================== + gfx90* ``global_load_dwordx4`` ``glc`` ``glc`` + + gfx942, gfx950 ``global_load_dwordx4`` (wave) ``sc0`` (group) ``sc1`` (device) ``sc0 sc1`` (system) + + gfx10* ``global_load_dwordx4`` ``glc`` ``glc dlc`` ``glc dlc`` + + gfx11* ``global_load_dwordx4`` ``glc`` ``glc`` ``glc`` + + gfx120* ``global_load_b128`` (CU) ``scope:SCOPE_SE`` (SE) ``scope:SCOPE_DEV`` (DEV) ``scope:SCOPE_SYS`` (SYS) + + gfx125* ``global_load_b128`` (CU) ``scope:SCOPE_DEV`` (DEV) ``scope:SCOPE_SYS`` (SYS) + ============== ========================== ========================== ========================== ========================== ========================== + + For gfx90*, see "GLC Bit Explained" in the appropriate instruction set reference + (e.g. Chapter 9.1.10 in "AMD Instinct MI100" Instruction Set Architecture Reference + Guide). + + For gfx942 and gfx950 targets, see "Memory Scope and Temporal Controls" in the + appropriate instruction set reference (e.g. Chapter 9.1.10.2 in the "AMD Instinct + MI300" Instruction Set Architecture Reference Guide). + + For gfx10* targets, see "GLC, DLC and SLC Bit Explained" in the appropriate + instruction set reference (e.g. Chapter 8.1.10 in "RDNA 2" Instruction Set Architecture + Reference Guide) + + For gfx11* targets, see "Cache Controls: SLC, GLC and DLC" in the appropriate + instruction set reference (e.g. Chapter 4.1.1 in "RDNA3" Instruction Set Architecture + ... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/172090 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
