llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-clang-codegen Author: Alex Voicu (AlexVlx) <details> <summary>Changes</summary> This change adds two semi-magical builtins for AMDGPU: - `__builtin_amdgcn_processor_is`, which is similar in observable behaviour with `__builtin_cpu_is`, except that it is never "evaluated" at run time; - `__builtin_amdgcn_is_invocable`, which is behaviourally similar with `__has_builtin`, except that it is not a macro (i.e. not evaluated at preprocessing time). Neither of these are `constexpr`, even though when compiling for concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated in Clang, so they shouldn't tear the AST too badly / at all for multi-pass compilation cases like HIP. They can only be used in specific contexts (as args to control structures). The motivation for adding these is two-fold: - as a nice to have, it provides an AST-visible way to incorporate architecture specific code, rather than having to rely on macros and the preprocessor, which burn in the choice quite early; - as a must have, it allows featureful AMDGCN flavoured SPIR-V to be produced, where target specific capability is guarded and chosen or discarded when finalising compilation for a concrete target. I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). In the end, I will note there is nothing that is actually AMDGPU specific here, so it is possible that in the future, assuming interests from other targets / users, we'd just promote them to generic intrinsics. --- Patch is 59.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/134016.diff 17 Files Affected: - (modified) clang/docs/LanguageExtensions.rst (+110) - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+5) - (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+10) - (modified) clang/lib/Basic/Targets/SPIR.cpp (+4) - (modified) clang/lib/Basic/Targets/SPIR.h (+4) - (modified) clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp (+29) - (modified) clang/lib/Sema/SemaExpr.cpp (+157) - (added) clang/test/CodeGen/amdgpu-builtin-cpu-is.c (+65) - (added) clang/test/CodeGen/amdgpu-builtin-is-invocable.c (+64) - (added) clang/test/CodeGen/amdgpu-feature-builtins-invalid-use.cpp (+43) - (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+9) - (added) llvm/lib/Target/AMDGPU/AMDGPUExpandPseudoIntrinsics.cpp (+207) - (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+2) - (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+2-1) - (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1) - (added) llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates-unfoldable.ll (+28) - (added) llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates.ll (+359) ``````````diff diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst index 3b8a9cac6587a..8a7cb75af13e5 100644 --- a/clang/docs/LanguageExtensions.rst +++ b/clang/docs/LanguageExtensions.rst @@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address spaces are fenced. __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local") __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global") +__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide +a functional mechanism for programatically querying: + +* the identity of the current target processor; +* the capability of the current target processor to invoke a particular builtin. + +**Syntax**: + +.. code-block:: c + + // When used as the predicate for a control structure + bool __builtin_amdgcn_processor_is(const char*); + bool __builtin_amdgcn_is_invocable(builtin_name); + // Otherwise + void __builtin_amdgcn_processor_is(const char*); + void __builtin_amdgcn_is_invocable(void); + +**Example of use**: + +.. code-block:: c++ + + if (__builtin_amdgcn_processor_is("gfx1201") || + __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var)) + __builtin_amdgcn_s_sleep_var(x); + + if (!__builtin_amdgcn_processor_is("gfx906")) + __builtin_amdgcn_s_wait_event_export_ready(); + else if (__builtin_amdgcn_processor_is("gfx1010") || + __builtin_amdgcn_processor_is("gfx1101")) + __builtin_amdgcn_s_ttracedata_imm(1); + + while (__builtin_amdgcn_processor_is("gfx1101")) *p += x; + + do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010")); + + for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break; + + if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready)) + __builtin_amdgcn_s_wait_event_export_ready(); + else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm)) + __builtin_amdgcn_s_ttracedata_imm(1); + + do { + *p -= x; + } while (__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32)); + + for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) break; + +**Description**: + +When used as the predicate value of the following control structures: + +.. code-block:: c++ + + if (...) + while (...) + do { } while (...) + for (...) + +be it directly, or as arguments to logical operators such as ``!, ||, &&``, the +builtins return a boolean value that: + +* indicates whether the current target matches the argument; the argument MUST + be a string literal and a valid AMDGPU target +* indicates whether the builtin function passed as the argument can be invoked + by the current target; the argument MUST be either a generic or AMDGPU + specific builtin name + +Outside of these contexts, the builtins have a ``void`` returning signature +which prevents their misuse. + +**Example of invalid use**: + +.. code-block:: c++ + + void kernel(int* p, int x, bool (*pfn)(bool), const char* str) { + if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return; + else if (__builtin_amdgcn_processor_is(str)) __builtin_trap(); + + bool a = __builtin_amdgcn_processor_is("gfx906"); + const bool b = !__builtin_amdgcn_processor_is("gfx906"); + const bool c = !__builtin_amdgcn_processor_is("gfx906"); + bool d = __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var); + bool e = !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var); + const auto f = + !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready) + || __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var); + const auto g = + !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready) + || !__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var); + __builtin_amdgcn_processor_is("gfx1201") + ? __builtin_amdgcn_s_sleep_var(x) : __builtin_amdgcn_s_sleep(42); + if (pfn(__builtin_amdgcn_processor_is("gfx1200"))) + __builtin_amdgcn_s_sleep_var(x); + + if (__builtin_amdgcn_is_invocable("__builtin_amdgcn_s_sleep_var")) return; + else if (__builtin_amdgcn_is_invocable(x)) __builtin_trap(); + } + +When invoked while compiling for a concrete target, the builtins are evaluated +early by Clang, and never produce any CodeGen effects / have no observable +side-effects in IR. Conversely, when compiling for AMDGCN flavoured SPIR-v, +which is an abstract target, a series of predicate values are implicitly +created. These predicates get resolved when finalizing the compilation process +for a concrete target, and shall reflect the latter's identity and features. +Thus, it is possible to author high-level code, in e.g. HIP, that is target +adaptive in a dynamic fashion, contrary to macro based mechanisms. ARM/AArch64 Language Extensions ------------------------------- diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 44ef404aee72f..5d01a7e75f7e7 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -346,6 +346,11 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr") BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n") BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n") +// These are special FE only builtins intended for forwarding the requirements +// to the ME. +BUILTIN(__builtin_amdgcn_processor_is, "vcC*", "nctu") +BUILTIN(__builtin_amdgcn_is_invocable, "v", "nctu") + //===----------------------------------------------------------------------===// // R600-NI only builtins. //===----------------------------------------------------------------------===// diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 5e45482584946..45f0f9eb88e55 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -13054,4 +13054,14 @@ def err_acc_decl_for_routine // AMDGCN builtins diagnostics def err_amdgcn_global_load_lds_size_invalid_value : Error<"invalid size value">; def note_amdgcn_global_load_lds_size_valid_value : Note<"size must be %select{1, 2, or 4|1, 2, 4, 12 or 16}0">; +def err_amdgcn_processor_is_arg_not_literal + : Error<"the argument to __builtin_amdgcn_processor_is must be a string " + "literal">; +def err_amdgcn_processor_is_arg_invalid_value + : Error<"the argument to __builtin_amdgcn_processor_is must be a valid " + "AMDGCN processor identifier; '%0' is not valid">; +def err_amdgcn_is_invocable_arg_invalid_value + : Error<"the argument to __builtin_amdgcn_is_invocable must be either a " + "target agnostic builtin or an AMDGCN target specific builtin; `%0`" + " is not valid">; } // end of sema component. diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp index 5b5f47f9647a2..eb43d9b0be283 100644 --- a/clang/lib/Basic/Targets/SPIR.cpp +++ b/clang/lib/Basic/Targets/SPIR.cpp @@ -152,3 +152,7 @@ void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) { Float128Format = DoubleFormat; } } + +bool SPIRV64AMDGCNTargetInfo::isValidCPUName(StringRef CPU) const { + return AMDGPUTI.isValidCPUName(CPU); +} diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h index 78505d66d6f2f..7aa13cbeb89fd 100644 --- a/clang/lib/Basic/Targets/SPIR.h +++ b/clang/lib/Basic/Targets/SPIR.h @@ -432,6 +432,10 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo final } bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); } + + // This is only needed for validating arguments passed to + // __builtin_amdgcn_processor_is + bool isValidCPUName(StringRef Name) const override; }; } // namespace targets diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp index b56b739094ff3..7b1a3815144b4 100644 --- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp +++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp @@ -284,6 +284,18 @@ void CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst, Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs)); } +static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) { + auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext()); + + auto P = cast<GlobalVariable>( + CGF.CGM.getModule().getOrInsertGlobal(Name.str(), PTy)); + P->setConstant(true); + P->setExternallyInitialized(true); + + return CGF.Builder.CreateLoad(RawAddress(P, PTy, CharUnits::One(), + KnownNonNull)); +} + Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, const CallExpr *E) { llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent; @@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, llvm::Value *Env = EmitScalarExpr(E->getArg(0)); return Builder.CreateCall(F, {Env}); } + case AMDGPU::BI__builtin_amdgcn_processor_is: { + assert(CGM.getTriple().isSPIRV() && + "__builtin_amdgcn_processor_is should never reach CodeGen for " + "concrete targets!"); + StringRef Proc = cast<clang::StringLiteral>(E->getArg(0))->getString(); + return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.is." + Proc); + } + case AMDGPU::BI__builtin_amdgcn_is_invocable: { + assert(CGM.getTriple().isSPIRV() && + "__builtin_amdgcn_is_invocable should never reach CodeGen for " + "concrete targets!"); + auto FD = cast<FunctionDecl>( + cast<DeclRefExpr>(E->getArg(0))->getReferencedDeclOfCallee()); + StringRef RF = + getContext().BuiltinInfo.getRequiredFeatures(FD->getBuiltinID()); + return GetOrInsertAMDGPUPredicate(*this, "llvm.amdgcn.has." + RF); + } case AMDGPU::BI__builtin_amdgcn_read_exec: return EmitAMDGCNBallotForExec(*this, E, Int64Ty, Int64Ty, false); case AMDGPU::BI__builtin_amdgcn_read_exec_lo: diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp index 7cc8374e69d73..24f5262ab3cf4 100644 --- a/clang/lib/Sema/SemaExpr.cpp +++ b/clang/lib/Sema/SemaExpr.cpp @@ -6541,6 +6541,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, SourceLocation LParenLoc, if (Result.isInvalid()) return ExprError(); Fn = Result.get(); + // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved + // later, when we check boolean conditions, for now we merely forward it + // without any additional checking. + if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 && + ArgExprs[0]->getType() == Context.BuiltinFnTy) { + auto FD = cast<FunctionDecl>(Fn->getReferencedDeclOfCallee()); + + if (FD->getName() == "__builtin_amdgcn_is_invocable") { + auto FnPtrTy = Context.getPointerType(FD->getType()); + auto R = ImpCastExprToType(Fn, FnPtrTy, CK_BuiltinFnToFnPtr).get(); + return CallExpr::Create(Context, R, ArgExprs, Context.VoidTy, + ExprValueKind::VK_PRValue, RParenLoc, + FPOptionsOverride()); + } + } + if (CheckArgsForPlaceholders(ArgExprs)) return ExprError(); @@ -13234,6 +13250,20 @@ inline QualType Sema::CheckBitwiseOperands(ExprResult &LHS, ExprResult &RHS, return InvalidOperands(Loc, LHS, RHS); } +static inline bool IsAMDGPUPredicateBI(Expr *E) { + if (!E->getType()->isVoidType()) + return false; + + if (auto CE = dyn_cast<CallExpr>(E)) { + if (auto BI = CE->getDirectCallee()) + if (BI->getName() == "__builtin_amdgcn_processor_is" || + BI->getName() == "__builtin_amdgcn_is_invocable") + return true; + } + + return false; +} + // C99 6.5.[13,14] inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS, SourceLocation Loc, @@ -13329,6 +13359,9 @@ inline QualType Sema::CheckLogicalOperands(ExprResult &LHS, ExprResult &RHS, // The following is safe because we only use this method for // non-overloadable operands. + if (IsAMDGPUPredicateBI(LHS.get()) && IsAMDGPUPredicateBI(RHS.get())) + return Context.VoidTy; + // C++ [expr.log.and]p1 // C++ [expr.log.or]p1 // The operands are both contextually converted to type bool. @@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, QualType T) { return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy); } +static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) { + if (!CE->getBuiltinCallee()) + return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, CE->getExprLoc()); + + if (Ctx.getTargetInfo().getTriple().isSPIRV()) { + CE->setType(Ctx.getLogicalOperationType()); + return CE; + } + + bool P = false; + auto &TI = Ctx.getTargetInfo(); + + if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") { + auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts()); + auto TID = TI.getTargetID(); + if (GFX && TID) { + auto N = GFX->getString(); + P = TI.isValidCPUName(GFX->getString()) && TID->find(N) == 0; + } + } else { + auto FD = cast<FunctionDecl>(CE->getArg(0)->getReferencedDeclOfCallee()); + + StringRef RF = Ctx.BuiltinInfo.getRequiredFeatures(FD->getBuiltinID()); + llvm::StringMap<bool> CF; + Ctx.getFunctionFeatureMap(CF, FD); + + P = Builtin::evaluateRequiredTargetFeatures(RF, CF); + } + + return CXXBoolLiteralExpr::Create(Ctx, P, Ctx.BoolTy, CE->getExprLoc()); +} + ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc, UnaryOperatorKind Opc, Expr *InputExpr, bool IsAfterAmp) { @@ -15753,6 +15818,8 @@ ExprResult Sema::CreateBuiltinUnaryOp(SourceLocation OpLoc, // Vector logical not returns the signed variant of the operand type. resultType = GetSignedVectorType(resultType); break; + } else if (IsAMDGPUPredicateBI(InputExpr)) { + break; } else { return ExprError(Diag(OpLoc, diag::err_typecheck_unary_expr) << resultType << Input.get()->getSourceRange()); @@ -20469,6 +20536,88 @@ void Sema::DiagnoseEqualityWithExtraParens(ParenExpr *ParenE) { } } +static bool ValidateAMDGPUPredicateBI(Sema &Sema, CallExpr *CE) { + if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") { + auto GFX = dyn_cast<StringLiteral>(CE->getArg(0)->IgnoreParenCasts()); + if (!GFX) { + Sema.Diag(CE->getExprLoc(), + diag::err_amdgcn_processor_is_arg_not_literal); + return false; + } + auto N = GFX->getString(); + if (!Sema.getASTContext().getTargetInfo().isValidCPUName(N) && + (!Sema.getASTContext().getAuxTargetInfo() || + !Sema.getASTContext().getAuxTargetInfo()->isValidCPUName(N))) { + Sema.Diag(CE->getExprLoc(), + diag::err_amdgcn_processor_is_arg_invalid_value) << N; + return false; + } + } else { + auto Arg = CE->getArg(0); + if (!Arg || Arg->getType() != Sema.getASTContext().BuiltinFnTy) { + Sema.Diag(CE->getExprLoc(), + diag::err_amdgcn_is_invocable_arg_invalid_value) << Arg; + return false; + } + } + + return true; +} + +static Expr *MaybeHandleAMDGPUPredicateBI(Sema &Sema, Expr *E, bool &Invalid) { + if (auto UO = dyn_cast<UnaryOperator>(E)) { + auto SE = dyn_cast<CallExpr>(UO->getSubExpr()); + if (IsAMDGPUPredicateBI(SE)) { + assert( + UO->getOpcode() == UnaryOperator::Opcode::UO_LNot && + "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable " + "can only be used as operands of logical ops!"); + + if (!ValidateAMDGPUPredicateBI(Sema, SE)) { + Invalid = true; + return nullptr; + } + + UO->setSubExpr(ExpandAMDGPUPredicateBI(Sema.getASTContext(), SE)); + UO->setType(Sema.getASTContext().getLogicalOperationType()); + + return UO; + } + } + if (auto BO = dyn_cast<BinaryOperator>(E)) { + auto LHS = dyn_cast<CallExpr>(BO->getLHS()); + auto RHS = dyn_cast<CallExpr>(BO->getRHS()); + if (IsAMDGPUPredicateBI(LHS) && IsAMDGPUPredicateBI(RHS)) { + assert( + BO->isLogicalOp() && + "__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable " + "can only be used as operands of logical ops!"); + + if (!ValidateAMDGPUPredicateBI(Sema, LHS) || + !ValidateAMDGPUPredicateBI(Sema, RHS)) { + Invalid = true; + return nullptr; + } + + BO->setLHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), LHS)); + BO->setRHS(ExpandAMDGPUPredicateBI(Sema.getASTContext(), RHS)); + BO->setType(Sema.getASTContext().getLogicalOperationType()); + + return BO; + } + } + if (auto CE = dyn_cast<CallExpr>(E)) + if (IsAMDGPUPredicateBI(CE)) { + if (!ValidateAMDGPUPredicateBI(Sema, CE)) { + Invalid = true; + return nullptr; + } + return ExpandAMDGPUPredicateBI(Sema.getASTContext(), CE); + } + + return nullptr; +} + ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E, bool IsConstexpr) { DiagnoseAssignmentAsCondition(E); @@ -20480,6 +20629,14 @@ ExprResult Sema::CheckBooleanCondition(SourceLocation Loc, Expr *E, E = result.get(); if (!E->isTypeDependent()) { + if (E->getType()->isVoidType()) { + bool IsInvalidPredicate = false; + if (auto BIC = MaybeHandleAMDGPUPredicateBI(*this, E, IsInvalidPredicate)) + return BIC; + else if (IsInvalidPredicate) + return ExprError(); + } + if (getLangOpts().CPlusPlus) return CheckCXXBooleanCondition(E, IsConstexpr); // C++ 6.4p4 diff --git a/clang/test/CodeGen/amdgpu-builtin-cpu-is.c b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c new file mode 100644 index 0000000000000..6e261d9f5d239 --- /dev/null +++ b/clang/test/CodeGen/amdgpu-builtin-cpu-is.c @@ -0,0 +1,65 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals all --version 5 +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX900 %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s +// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck --check-prefix=AMDGCNSPIRV %s + +// Test that, depending on triple and, if applicable, target-cpu, one of three +// things happens: +// 1) for gfx900 we emit a call to trap (concrete target, matches) +// 2) for gfx1010 we emit an empty kernel (concrete target, does not match) +// 3) for AMDGCNSPIRV we emit llvm.amdgcn.is.gfx900 as a bool global, and +// load from it to provide the condition a br (abstract target) +//. +// AMDGCN-GFX900: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600 +//. +// AMDGCN-GFX1010: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600 +//. +// AMDGCNSPIRV: @llvm.amdgcn.is.gfx900 = external addrspace(1) externally_initialized constant i1 +//. +// AMDGCN-GFX900-LABEL: define dso_local void @foo( +// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] { +// AMDGCN-GFX900-NEXT: [[ENTRY:.*:]] +// AMDGCN-GFX900-NEXT: call void @llvm.trap() +// AMDGCN-GFX900-NEXT: ret void +// +// AMDGCN-GFX1010-LABEL: define dso_local void @foo( +// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] { +// AMDGCN-GFX1010-NEXT: [[ENTRY:.*:]] +// AMDGCN-GFX1010-NEXT: ret void +// +// AMDGCNSPIRV-LABEL: define spir_func void @foo( +// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] { +// AMDGCNSPIRV-NEXT: [[ENTRY:.*:]] +// AMDGCNSPIRV-NEXT: [[TMP0:%.*]] = load i1, ptr addrspace(1) @llvm.amdgcn.is.gfx900, align 1 +// AMDGCNSPIRV-NEXT: br i1 [[TMP0]], label %[[IF_THEN:.*]], label %[[IF_END:.*]] +// AMDGCNSPIRV: [[IF_THEN]]: +// AMDGCNSPIRV-NEXT: call addrspace(4) void @llvm.trap() +// AMDGCNSPIRV-NEXT: br label %[[IF_END]] +// AMDGCNSPIRV: [[IF_END]]: +// AMDGCNSPIRV-NEXT: ret void +// +void foo() { + if (__builtin_cpu_is("gfx90... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/134016 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits