[clang-tools-extra] [clang-tidy] add modernize-use-std-numbers (PR #66583)
https://github.com/PiotrZSL requested changes to this pull request. Example: ``` llvm/include/llvm/Support/MathExtras.h:59:31: warning: prefer std::numbers math constant [modernize-use-std-numbers] 59 | inv_sqrt3f = .577350269F, // (0x1.279a74P-1) | ^~~ | std::numbers::egamma_v ``` ``` egammaf = .577215665F ``` Looks like having this check implemented as an multiple matchers isn't a good idea, simply because we pickup first one that match instead a nearest one. This leads to bugs when dealing with proper values. In ideal conditions something like x* 3.14 should be even detected as PI. Also warning message should already say what from std::numbers should be used and how far are current and proposed values from them self. https://github.com/llvm/llvm-project/pull/66583 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Llvm modules on demand bmi (PR #71773)
ChuanqiXu9 wrote: > > There are 2 things in the patch. One is to generate the BMI and the object > > file in one phase (phase here means preprocess, precompile, compile, ...). > > This is the main point of the patch - to do this efficiently. Got it. The we can be more focused. > > > But after we introduced thin BMI, it looks inefficient to write the AST > > twice. So it is on my TODO list after we land the thin BMI patch. BTW, I > > think we should do thin in CodeGen action instead of hacking on > > WrappedASTConsumer. > > I am curious as to why you think that the multiplex AST consumer is a hack - > it seems to be designed exactly for this purpose and existed already (i.e. > not part of this patch). It is not about multiplex AST consumer. It is about WrappedASTConsumer. It is designed for plugins. Also it is a private member function of FrontendAction, the base of frontend actions. I think we should perform new behaviors in sub-actions. It looks not good to perform semantical analysis in FrontendAction... Concretely, I think we need to do this in CodeGenAction. > > > And if we introduce the mechanism to produce BMI for `.cpp`, it implies > > that we need to maintain both paths. It is super embracing to me. > > We do not need two mechanisms, .cppm can take the same path as any other > suffix. Then it implies that we need to discard a bunch of existing codes handling `.cppm`. Otherwise we'll have two mechanisms. > > > > in the AST consumer on the BMI side doing suitable filtering to eliminate > > > the content that is not part of the interface, that is either not needed > > > (or in some cases positively unhelpful to consumers). > > > I believe we should do this in ASTWriters. > > I am strongly against doing more semantic work in the AST reader/writer; that > is just compounding existing layering violations that are already hurting us. Agreed in the higher level. But that requires us to implement at least new AST writers. > > > Also this should be part of thin BMI. > > I am not sure what you mean here - the full AST is required for code-gen - we > can only thin AST either on a separate path (as in this patch) or as a > separate step. I mean it should be successors of https://github.com/llvm/llvm-project/pull/71622. Concretely, now we reduce the function definition in https://github.com/llvm/llvm-project/pull/71622/files#diff-125f472e690aa3d973bc42aa3c5d580226c5c47661551aca2889f960681aa64dR321. https://github.com/llvm/llvm-project/pull/71773 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] add modernize-use-std-numbers (PR #66583)
https://github.com/PiotrZSL edited https://github.com/llvm/llvm-project/pull/66583 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Llvm modules on demand bmi (PR #71773)
boris-kolpackov wrote: >clang++ -std=c++20 foo.cpp -c -fmodule-file=X=some/dir/X.pcm Hm, according to https://clang.llvm.org/docs/StandardCPlusPlusModules.html this can already be achieved with the `-fmodule-output` options (which I was about to try in `build2`). Is there a reason a different option is used for what seems to be the same functionality. Or am I missing something here? > This is the main point of the patch - to do this efficiently. Again, just want to clarify: as I understand it, this patch solves the scaling issue Ben reported (https://github.com/llvm/llvm-project/issues/60996) but without the thin/fat BMI complications, correct? https://github.com/llvm/llvm-project/pull/71773 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [flang] [libcxx] [llvm] [compiler-rt] [clang-tools-extra] [BOLT] Read .rela.dyn in static non-pie binary (PR #71635)
https://github.com/yota9 updated https://github.com/llvm/llvm-project/pull/71635 >From 1006708c3cff79b9504beb26ea82cadaec3bb594 Mon Sep 17 00:00:00 2001 From: Vladislav Khmelevsky Date: Wed, 8 Nov 2023 11:57:16 +0400 Subject: [PATCH] [BOLT] Read .rela.dyn in static non-pie binary Static non-pie binary doesn't have DYNAMIC segment and BOLT skips reading .rela.dyn segment because of it. But such binaries might have this section for example to store IFUNC relocation which is resolved by linked-in startup files, so force reading this section for static executables. --- bolt/include/bolt/Rewrite/RewriteInstance.h | 1 + bolt/lib/Rewrite/RewriteInstance.cpp| 13 +++ bolt/test/AArch64/ifunc.c | 24 +++-- 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/bolt/include/bolt/Rewrite/RewriteInstance.h b/bolt/include/bolt/Rewrite/RewriteInstance.h index 2a421c5cfaa4f89..6e9af61d76e30f6 100644 --- a/bolt/include/bolt/Rewrite/RewriteInstance.h +++ b/bolt/include/bolt/Rewrite/RewriteInstance.h @@ -421,6 +421,7 @@ class RewriteInstance { /// Common section names. static StringRef getEHFrameSectionName() { return ".eh_frame"; } + static StringRef getRelaDynSectionName() { return ".rela.dyn"; } /// An instance of the input binary we are processing, externally owned. llvm::object::ELFObjectFileBase *InputFile; diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp index abdbb79e8eb60ef..2d7df15025e3685 100644 --- a/bolt/lib/Rewrite/RewriteInstance.cpp +++ b/bolt/lib/Rewrite/RewriteInstance.cpp @@ -2139,6 +2139,19 @@ void RewriteInstance::processDynamicRelocations() { } // The rest of dynamic relocations - DT_RELA. + // The static executable might have .rela.dyn secion and not have PT_DYNAMIC + if (!DynamicRelocationsSize && BC->IsStaticExecutable) { +ErrorOr DynamicRelSectionOrErr = +BC->getUniqueSectionByName(getRelaDynSectionName()); +if (DynamicRelSectionOrErr) { + DynamicRelocationsAddress = DynamicRelSectionOrErr->getAddress(); + DynamicRelocationsSize = DynamicRelSectionOrErr->getSize(); + const SectionRef &SectionRef = DynamicRelSectionOrErr->getSectionRef(); + DynamicRelativeRelocationsCount = std::distance( + SectionRef.relocation_begin(), SectionRef.relocation_end()); +} + } + if (DynamicRelocationsSize > 0) { ErrorOr DynamicRelSectionOrErr = BC->getSectionForAddress(*DynamicRelocationsAddress); diff --git a/bolt/test/AArch64/ifunc.c b/bolt/test/AArch64/ifunc.c index dea2cf6bd543f0a..8edb913ee70d5c0 100644 --- a/bolt/test/AArch64/ifunc.c +++ b/bolt/test/AArch64/ifunc.c @@ -7,6 +7,20 @@ // RUN: llvm-bolt %t.O0.exe -o %t.O0.bolt.exe \ // RUN: --print-disasm --print-only=_start | \ // RUN: FileCheck --check-prefix=O0_CHECK %s +// RUN: llvm-readelf -aW %t.O0.bolt.exe | \ +// RUN: FileCheck --check-prefix=REL_CHECK %s + +// Non-pie static executable doesn't generate PT_DYNAMIC, check relocation +// is readed successfully and IPLT trampoline has been identified by bolt. +// RUN: %clang %cflags -nostdlib -O3 %s -fuse-ld=lld -no-pie \ +// RUN: -o %t.O3_nopie.exe -Wl,-q +// RUN: llvm-readelf -l %t.O3_nopie.exe | \ +// RUN: FileCheck --check-prefix=NON_DYN_CHECK %s +// RUN: llvm-bolt %t.O3_nopie.exe -o %t.O3_nopie.bolt.exe \ +// RUN: --print-disasm --print-only=_start | \ +// RUN: FileCheck --check-prefix=O3_CHECK %s +// RUN: llvm-readelf -aW %t.O3_nopie.bolt.exe | \ +// RUN: FileCheck --check-prefix=REL_CHECK %s // With -O3 direct call is performed on IPLT trampoline. IPLT trampoline // doesn't have associated symbol. The ifunc symbol has the same address as @@ -16,6 +30,8 @@ // RUN: llvm-bolt %t.O3_pie.exe -o %t.O3_pie.bolt.exe \ // RUN: --print-disasm --print-only=_start | \ // RUN: FileCheck --check-prefix=O3_CHECK %s +// RUN: llvm-readelf -aW %t.O3_pie.bolt.exe | \ +// RUN: FileCheck --check-prefix=REL_CHECK %s // Check that IPLT trampoline located in .plt section are normally handled by // BOLT. The gnu-ld linker doesn't use separate .iplt section. @@ -24,12 +40,16 @@ // RUN: llvm-bolt %t.iplt_O3_pie.exe -o %t.iplt_O3_pie.bolt.exe \ // RUN: --print-disasm --print-only=_start | \ // RUN: FileCheck --check-prefix=O3_CHECK %s +// RUN: llvm-readelf -aW %t.iplt_O3_pie.bolt.exe | \ +// RUN: FileCheck --check-prefix=REL_CHECK %s + +// NON_DYN_CHECK-NOT: DYNAMIC // O0_CHECK: adr x{{[0-9]+}}, ifoo // O3_CHECK: b "{{resolver_foo|ifoo}}{{.*}}@PLT" -#include -#include +// REL_CHECK: R_AARCH64_IRELATIVE [[#%x,REL_SYMB_ADDR:]] +// REL_CHECK: [[#REL_SYMB_ADDR]] {{.*}} FUNC {{.*}} resolver_foo static void foo() {} ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Llvm modules on demand bmi (PR #71773)
ChuanqiXu9 wrote: > > clang++ -std=c++20 foo.cpp -c -fmodule-file=X=some/dir/X.pcm > > Hm, according to https://clang.llvm.org/docs/StandardCPlusPlusModules.html > this can already be achieved with the `-fmodule-output` option (and which I > was about to try in `build2`). Is there a reason a different option is used > for what seems to be the same functionality. Or am I missing something here? > > > This is the main point of the patch - to do this efficiently. > > Again, just want to clarify: as I understand it, this patch solves the > scaling issue Ben reported (#60996) but without the thin/fat BMI > complications, correct? The difference is about the efficiency and the interfaces doesn't change a lot. Previously, in the one phase compilation mode, what clang did actually is: ``` x.cppm -> x.pcm -> x.o ``` That said we compile `x.o` from `x.pcm`. There is a reading BMI process. The goal of the patch is to remove the reading process. https://github.com/llvm/llvm-project/pull/71773 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)
@@ -3,13 +3,9 @@ readability-container-data-pointer == -Finds cases where code could use ``data()`` rather than the address of the -element at index 0 in a container. This pattern is commonly used to materialize -a pointer to the backing data of a container. ``std::vector`` and -``std::string`` provide a ``data()`` accessor to retrieve the data pointer which -should be preferred. +Finds cases where code references the address of the element at index 0 in a container and replaces them with calls to ``data()`` or ``c_str()``. PiotrZSL wrote: Still not wrapped on 80 collumn https://github.com/llvm/llvm-project/pull/71304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)
@@ -111,16 +115,18 @@ void ContainerDataPointerCheck::check(const MatchFinder::MatchResult &Result) { MemberExpr>(CE)) ReplacementText = "(" + ReplacementText + ")"; - if (CE->getType()->isPointerType()) -ReplacementText += "->data()"; - else -ReplacementText += ".data()"; + ReplacementText += CE->getType()->isPointerType() ? "->" : "."; + ReplacementText += CStrMethod ? "c_str()" : "data()"; + + std::string Description = PiotrZSL wrote: use llvm::StringRef instead of std::string here https://github.com/llvm/llvm-project/pull/71304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Refactor `IdentifierInfo::ObjcOrBuiltinID` (PR #71709)
@@ -86,19 +87,26 @@ enum { IdentifierInfoAlignment = 8 }; static constexpr int ObjCOrBuiltinIDBits = 16; /// The "layout" of ObjCOrBuiltinID is: -/// - The first value (0) represents "not a special identifier". -/// - The next (NUM_OBJC_KEYWORDS - 1) values represent ObjCKeywordKinds (not -///including objc_not_keyword). -/// - The next (NUM_INTERESTING_IDENTIFIERS - 1) values represent -///InterestingIdentifierKinds (not including not_interesting). -/// - The rest of the values represent builtin IDs (not including NotBuiltin). -static constexpr int FirstObjCKeywordID = 1; -static constexpr int LastObjCKeywordID = -FirstObjCKeywordID + tok::NUM_OBJC_KEYWORDS - 2; -static constexpr int FirstInterestingIdentifierID = LastObjCKeywordID + 1; -static constexpr int LastInterestingIdentifierID = -FirstInterestingIdentifierID + tok::NUM_INTERESTING_IDENTIFIERS - 2; -static constexpr int FirstBuiltinID = LastInterestingIdentifierID + 1; +/// - ObjCKeywordKind enumerators +/// - InterestingIdentifierKind enumerators +/// - Builtin::ID enumerators +/// - NonSpecialIdentifier +enum class ObjCKeywordOrInterestingOrBuiltin { +#define OBJC_AT_KEYWORD(X) objc_##X, +#include "clang/Basic/TokenKinds.def" + NUM_OBJC_KEYWORDS, Endilll wrote: Not just this enumerator, but all OjbC keywords and interesting identifiers. I consider this a feature, actually, because debuggers would show enumerator name that both makes sense and useful while displaying `ObjCOrBuiltinID` bit-fields. Having `ObjCKeywordOrInterestingOrBuiltin` as a scoped enum should prevent name collisions with any other enum. enumeratrion https://github.com/llvm/llvm-project/pull/71709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clangd] Use InitLLVM (PR #69119)
https://github.com/hokein approved this pull request. https://github.com/llvm/llvm-project/pull/69119 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Refactor `IdentifierInfo::ObjcOrBuiltinID` (PR #71709)
Endilll wrote: > Oh, I didn't look into the identifier's system before. I took a while to look > at the patch but I failed to understand it and I failed to find the > relationships between this patch and header units... Yeah, the part this PR touches in not the most straightforward one. Thank you for you time! https://github.com/llvm/llvm-project/pull/71709 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Do not clear FP pragma stack when instantiating functions (PR #70646)
tru wrote: Can this be merged and ready for a backport next week? https://github.com/llvm/llvm-project/pull/70646 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)
mikaelholmen wrote: I think this patch causes miscompiles. Reproduce with ```opt bbi-88690.ll -passes=instcombine -S -o -``` So with this patch instcombine turns ``` @v_936 = global i16 -3276, align 1 @v_937 = global i24 0, align 1 define i16 @main() { entry: %0 = load i16, ptr @v_936, align 1 %unsclear = and i16 %0, 32767 %resize = zext i16 %unsclear to i24 %unsclear1 = and i24 %resize, 8388607 store i24 %unsclear1, ptr @v_937, align 1 ret i16 0 } ``` into ``` @v_936 = global i16 -3276, align 1 @v_937 = global i24 0, align 1 define i16 @main() { entry: %0 = load i16, ptr @v_936, align 1 %resize = zext nneg i16 %0 to i24 store i24 %resize, ptr @v_937, align 1 ret i16 0 } ``` I.e the and with 32767 (0x7fff) is gone and instead the zext got "nneg"? But the value in v_936 can be, and actually _is_ negative. [bbi-88690.ll.gz](https://github.com/llvm/llvm-project/files/13306009/bbi-88690.ll.gz) https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang-tools-extra] [clang] [PowerPC] Check value uses in ValueBit tracking (PR #66040)
https://github.com/ecnelises updated https://github.com/llvm/llvm-project/pull/66040 >From ebaafdd6d45bb62b1847e60df627dfd96971a22c Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Tue, 12 Sep 2023 10:39:55 +0800 Subject: [PATCH] [PowerPC] Check value uses in ValueBit tracking --- llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp | 162 +++--- llvm/test/CodeGen/PowerPC/int128_ldst.ll | 18 +- .../PowerPC/loop-instr-form-prepare.ll| 6 +- llvm/test/CodeGen/PowerPC/prefer-dqform.ll| 4 +- llvm/test/CodeGen/PowerPC/rldimi.ll | 19 +- 5 files changed, 117 insertions(+), 92 deletions(-) diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp index b57d185bb638b8c..8af50b10d3c7e1d 100644 --- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp +++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp @@ -1630,30 +1630,41 @@ class BitPermutationSelector { bool &Interesting = ValueEntry->first; SmallVector &Bits = ValueEntry->second; Bits.resize(NumBits); +SDValue LHS = V.getNumOperands() > 0 ? V.getOperand(0) : SDValue(); +SDValue RHS = V.getNumOperands() > 1 ? V.getOperand(1) : SDValue(); switch (V.getOpcode()) { default: break; case ISD::ROTL: - if (isa(V.getOperand(1))) { + if (isa(RHS)) { unsigned RotAmt = V.getConstantOperandVal(1); -const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second; - -for (unsigned i = 0; i < NumBits; ++i) - Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt]; +if (LHS.hasOneUse()) { + const auto &LHSBits = *getValueBits(LHS, NumBits).second; + for (unsigned i = 0; i < NumBits; ++i) +Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt]; +} else { + for (unsigned i = 0; i < NumBits; ++i) +Bits[i] = +ValueBit(LHS, i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt); +} return std::make_pair(Interesting = true, &Bits); } break; case ISD::SHL: case PPCISD::SHL: - if (isa(V.getOperand(1))) { + if (isa(RHS)) { unsigned ShiftAmt = V.getConstantOperandVal(1); -const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second; - -for (unsigned i = ShiftAmt; i < NumBits; ++i) - Bits[i] = LHSBits[i - ShiftAmt]; +if (LHS.hasOneUse()) { + const auto &LHSBits = *getValueBits(LHS, NumBits).second; + for (unsigned i = ShiftAmt; i < NumBits; ++i) +Bits[i] = LHSBits[i - ShiftAmt]; +} else { + for (unsigned i = ShiftAmt; i < NumBits; ++i) +Bits[i] = ValueBit(LHS, i - ShiftAmt); +} for (unsigned i = 0; i < ShiftAmt; ++i) Bits[i] = ValueBit(ValueBit::ConstZero); @@ -1663,13 +1674,17 @@ class BitPermutationSelector { break; case ISD::SRL: case PPCISD::SRL: - if (isa(V.getOperand(1))) { + if (isa(RHS)) { unsigned ShiftAmt = V.getConstantOperandVal(1); -const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second; - -for (unsigned i = 0; i < NumBits - ShiftAmt; ++i) - Bits[i] = LHSBits[i + ShiftAmt]; +if (LHS.hasOneUse()) { + const auto &LHSBits = *getValueBits(LHS, NumBits).second; + for (unsigned i = 0; i < NumBits - ShiftAmt; ++i) +Bits[i] = LHSBits[i + ShiftAmt]; +} else { + for (unsigned i = 0; i < NumBits - ShiftAmt; ++i) +Bits[i] = ValueBit(LHS, i + ShiftAmt); +} for (unsigned i = NumBits - ShiftAmt; i < NumBits; ++i) Bits[i] = ValueBit(ValueBit::ConstZero); @@ -1678,23 +1693,27 @@ class BitPermutationSelector { } break; case ISD::AND: - if (isa(V.getOperand(1))) { + if (isa(RHS)) { uint64_t Mask = V.getConstantOperandVal(1); -const SmallVector *LHSBits; +const SmallVector *LHSBits = nullptr; // Mark this as interesting, only if the LHS was also interesting. This // prevents the overall procedure from matching a single immediate 'and' // (which is non-optimal because such an and might be folded with other // things if we don't select it here). -std::tie(Interesting, LHSBits) = getValueBits(V.getOperand(0), NumBits); +if (LHS.hasOneUse()) + std::tie(Interesting, LHSBits) = getValueBits(LHS, NumBits); for (unsigned i = 0; i < NumBits; ++i) - if (((Mask >> i) & 1) == 1) -Bits[i] = (*LHSBits)[i]; - else { + if (((Mask >> i) & 1) == 1) { +if (LHS.hasOneUse()) + Bits[i] = (*LHSBits)[i]; +else + Bits[i] = ValueBit(LHS, i); + } else { // AND instruction masks this bit. If the input is already zero, // we have nothing to
[llvm] [clang-tools-extra] [clang] [PowerPC] Check value uses in ValueBit tracking (PR #66040)
ecnelises wrote: Gentle ping... any comments? https://github.com/llvm/llvm-project/pull/66040 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)
balazske wrote: I tested on vim and the problematic report disappeared, no other changes were detected. https://github.com/llvm/llvm-project/pull/71373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)
balazske wrote: The checker was already tested on some projects, but much more is needed to find such corner cases. It can be better to manually check the functions for cases when a 0 return value is not possible or only at a special (known) case. https://github.com/llvm/llvm-project/pull/71373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr commented: I have only briefly looked at the NVPTX implementation. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jplehr wrote: Is there a specific reason we do not return the error here, but instead consume and return success? Also, I think this should be `Plugin::success()` to not deviate from what is used in the plugin. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); +} + +// Allocate and construct the AMDGPU kernel. +GenericKernelTy *AMDGPUKernel = Plugin.allocate(); +if (!AMDGPUKernel) + return Plugin::error("Failed to allocate memory for AMDGPU kernel"); + +new (AMDGPUKernel) AMDGPUKernelTy(Name); +if (auto Err = AMDGPUKernel->initImpl(*this, Image)) + return std::move(Err); + +auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>(); +AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr); + +if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper)) + return std::move(Err); + +KernelArgsTy KernelArgs = {}; +if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u, +/*NumBlocks=*/1ul, KernelArgs, +/*Args=*/nullptr, AsyncInfoWrapper)) + return std::move(Err); + +if (auto Err = synchronize(AsyncInfoPtr)) + return std::move(Err); +Error Err = Error::success(); jplehr wrote: Should this be `Plugin::success()` instead here as well? https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)
@@ -289,3 +289,29 @@ namespace PR8168 { static void foo() {} // expected-error{{'static' member function 'foo' overrides a virtual function}} }; } + +namespace T13 { + class A { + public: +virtual const int* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual int* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides ('int *' has different qualifiers than 'const int *')}} + }; +} + +namespace T14 { + struct a {}; + + class A { + public: +virtual const a* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual volatile a* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides (class type 'volatile a *' is more qualified than class type 'const a *')}} ecnelises wrote: Hmm, right, we can't say `volatile` is more qualified than `const` or not. But `virtual volatile a* foo(); ... virtual a* foo() override;` is acceptable as long as `a` is a class-type, so saying `has different qualifiers` also looks inaccurate. https://github.com/llvm/llvm-project/pull/71696 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 0f7aaeb - [C++20] [Modules] Allow export from language linkage
Author: Chuanqi Xu Date: 2023-11-09T17:44:41+08:00 New Revision: 0f7aaeb3241c3803489a45753190e82dbc7fd5fa URL: https://github.com/llvm/llvm-project/commit/0f7aaeb3241c3803489a45753190e82dbc7fd5fa DIFF: https://github.com/llvm/llvm-project/commit/0f7aaeb3241c3803489a45753190e82dbc7fd5fa.diff LOG: [C++20] [Modules] Allow export from language linkage Close https://github.com/llvm/llvm-project/issues/71347 Previously I misread the concept of module purview. I thought if a declaration attached to a unnamed module, it can't be part of the module purview. But after the issue report, I recognized that module purview is more of a concept about locations instead of semantics. Concretely, the things in the language linkage after module declarations can be exported. This patch refactors `Module::isModulePurview()` and introduces some possible code cleanups. Added: Modified: clang/include/clang/Basic/Module.h clang/include/clang/Lex/ModuleMap.h clang/include/clang/Sema/Sema.h clang/include/clang/Serialization/ASTWriter.h clang/lib/AST/ASTContext.cpp clang/lib/AST/Decl.cpp clang/lib/AST/DeclBase.cpp clang/lib/CodeGen/CodeGenModule.cpp clang/lib/Frontend/ASTUnit.cpp clang/lib/Lex/ModuleMap.cpp clang/lib/Sema/Sema.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaDeclCXX.cpp clang/lib/Sema/SemaLookup.cpp clang/lib/Sema/SemaModule.cpp clang/lib/Serialization/ASTWriterDecl.cpp clang/test/Modules/export-language-linkage.cppm clang/test/SemaCXX/modules.cppm Removed: diff --git a/clang/include/clang/Basic/Module.h b/clang/include/clang/Basic/Module.h index 239eb5a637f3ecf..08b153e8c1c9d33 100644 --- a/clang/include/clang/Basic/Module.h +++ b/clang/include/clang/Basic/Module.h @@ -178,9 +178,8 @@ class alignas(8) Module { /// eventually be exposed, for use in "private" modules. std::string ExportAsModule; - /// Does this Module scope describe part of the purview of a standard named - /// C++ module? - bool isModulePurview() const { + /// Does this Module is a named module of a standard named module? + bool isNamedModule() const { switch (Kind) { case ModuleInterfaceUnit: case ModuleImplementationUnit: diff --git a/clang/include/clang/Lex/ModuleMap.h b/clang/include/clang/Lex/ModuleMap.h index d5824713970ea7b..32e7e8f899e502c 100644 --- a/clang/include/clang/Lex/ModuleMap.h +++ b/clang/include/clang/Lex/ModuleMap.h @@ -556,8 +556,8 @@ class ModuleMap { /// parent. Module *createGlobalModuleFragmentForModuleUnit(SourceLocation Loc, Module *Parent = nullptr); - Module *createImplicitGlobalModuleFragmentForModuleUnit( - SourceLocation Loc, bool IsExported, Module *Parent = nullptr); + Module *createImplicitGlobalModuleFragmentForModuleUnit(SourceLocation Loc, + Module *Parent); /// Create a global module fragment for a C++ module interface unit. Module *createPrivateModuleFragmentForInterfaceUnit(Module *Parent, diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h index fe8b387f198c56e..63d548c30da7f6e 100644 --- a/clang/include/clang/Sema/Sema.h +++ b/clang/include/clang/Sema/Sema.h @@ -2317,14 +2317,9 @@ class Sema final { clang::Module *TheGlobalModuleFragment = nullptr; /// The implicit global module fragments of the current translation unit. - /// We would only create at most two implicit global module fragments to - /// avoid performance penalties when there are many language linkage - /// exports. /// - /// The contents in the implicit global module fragment can't be discarded - /// no matter if it is exported or not. + /// The contents in the implicit global module fragment can't be discarded. clang::Module *TheImplicitGlobalModuleFragment = nullptr; - clang::Module *TheExportedImplicitGlobalModuleFragment = nullptr; /// Namespace definitions that we will export when they finish. llvm::SmallPtrSet DeferredExportedNamespaces; @@ -2336,9 +2331,7 @@ class Sema final { /// Helper function to judge if we are in module purview. /// Return false if we are not in a module. - bool isCurrentModulePurview() const { -return getCurrentModule() ? getCurrentModule()->isModulePurview() : false; - } + bool isCurrentModulePurview() const; /// Enter the scope of the explicit global module fragment. Module *PushGlobalModuleFragment(SourceLocation BeginLoc); @@ -2346,8 +2339,7 @@ class Sema final { void PopGlobalModuleFragment(); /// Enter the scope of an implicit global module fragment. - Module *PushImplicitGlobalModuleFragment(SourceLocation BeginLoc, - bool IsExported); + Module *PushImplicitGlobalModuleFragment(SourceLocation BeginLoc); /// Leave the scope of an implicit
[clang] [llvm] [InstCombine] Infer zext nneg flag (PR #71534)
dyung wrote: We also have a couple of internal tests that seem to be failing after this commit. Consider the following code: ```c++ char print_tmp[1]; void print(char *, void *data, unsigned size) { unsigned char *bytes = (unsigned char *)data; for (unsigned i = 0; i != size; ++i) sprintf(print_tmp + i * 2, "%02x", bytes[size - 1 - i]); printf(print_tmp); } #define PRINT(VAR) print(#VAR, &VAR, sizeof(VAR)) struct { long b : 17; } test141_struct_id29534; struct test141_struct_id29574_ { test141_struct_id29574_() { INIT(172, *this); } unsigned a : 15; } test141_struct_id29574; int main() { long id29692 = test141_struct_id29534.b = test141_struct_id29574.a; PRINT(id29692); } ``` When compiled without optimizations (and before this change with optimization) it would print out the value `2dac`. But after this change, when optimizations are enabled, the program now prints out `adac`. You can see the difference at https://godbolt.org/z/vjPvGT5G9. https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [compiler-rt] [libcxx] [flang] [llvm] [clang-tools-extra] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)
https://github.com/ecnelises updated https://github.com/llvm/llvm-project/pull/71696 >From 1d0109b7f370a3689a92e20ab52597b112669e47 Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Thu, 9 Nov 2023 00:00:26 +0800 Subject: [PATCH 1/2] [Clang][Sema] Fix qualifier restriction of overriden methods If return type of overriden method is pointer or reference to non-class type, qualifiers cannot be dropped. This also fixes check when qualifier of overriden method's class return type is not subset of super method's. --- .../clang/Basic/DiagnosticSemaKinds.td| 2 +- clang/lib/Sema/SemaDeclCXX.cpp| 15 +- clang/test/SemaCXX/virtual-override.cpp | 28 ++- 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 18c2e861385e463..e60a7513d54e552 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -2115,7 +2115,7 @@ def err_covariant_return_type_different_qualifications : Error< def err_covariant_return_type_class_type_more_qualified : Error< "return type of virtual function %0 is not covariant with the return type of " "the function it overrides (class type %1 is more qualified than class " - "type %2">; + "type %2)">; // C++ implicit special member functions def note_in_declaration_of_implicit_special_member : Note< diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp index 60786a880b9d3fd..b2c1f1fff9d7e7b 100644 --- a/clang/lib/Sema/SemaDeclCXX.cpp +++ b/clang/lib/Sema/SemaDeclCXX.cpp @@ -18469,7 +18469,7 @@ bool Sema::CheckOverridingFunctionReturnType(const CXXMethodDecl *New, // The new class type must have the same or less qualifiers as the old type. - if (NewClassTy.isMoreQualifiedThan(OldClassTy)) { + if (!OldClassTy.isAtLeastAsQualifiedAs(NewClassTy)) { Diag(New->getLocation(), diag::err_covariant_return_type_class_type_more_qualified) << New->getDeclName() << NewTy << OldTy @@ -18479,6 +18479,19 @@ bool Sema::CheckOverridingFunctionReturnType(const CXXMethodDecl *New, return true; } + // Non-class return types should not drop qualifiers in overriden method. + if (!OldClassTy->isStructureOrClassType() && + OldClassTy.getLocalCVRQualifiers() != + NewClassTy.getLocalCVRQualifiers()) { +Diag(New->getLocation(), + diag::err_covariant_return_type_different_qualifications) +<< New->getDeclName() << NewTy << OldTy +<< New->getReturnTypeSourceRange(); +Diag(Old->getLocation(), diag::note_overridden_virtual_function) +<< Old->getReturnTypeSourceRange(); +return true; + } + return false; } diff --git a/clang/test/SemaCXX/virtual-override.cpp b/clang/test/SemaCXX/virtual-override.cpp index 72abfc3cf51e1f7..003f4826a3d6c86 100644 --- a/clang/test/SemaCXX/virtual-override.cpp +++ b/clang/test/SemaCXX/virtual-override.cpp @@ -87,7 +87,7 @@ class A { class B : A { virtual a* f(); - virtual const a* g(); // expected-error{{return type of virtual function 'g' is not covariant with the return type of the function it overrides (class type 'const a *' is more qualified than class type 'a *'}} + virtual const a* g(); // expected-error{{return type of virtual function 'g' is not covariant with the return type of the function it overrides (class type 'const a *' is more qualified than class type 'a *')}} }; } @@ -289,3 +289,29 @@ namespace PR8168 { static void foo() {} // expected-error{{'static' member function 'foo' overrides a virtual function}} }; } + +namespace T13 { + class A { + public: +virtual const int* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual int* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides ('int *' has different qualifiers than 'const int *')}} + }; +} + +namespace T14 { + struct a {}; + + class A { + public: +virtual const a* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual volatile a* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides (class type 'volatile a *' is more qualified than class type 'const a *')}} + }; +} >From 5f64fec64b51542abd72a9a870ae9e5fe357d026 Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Thu, 9 Nov 2023 17:49:33 +0800 Subject: [PATCH 2/2] Say 'different qualifiers' instead of 'more qualified' --- clang/test/SemaCXX/virtual-override.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/test/SemaCXX/virtual-override.cpp b/clang/test/SemaCXX/virtual-override.cpp index 003f4826a3d6c86..3a10e15a663a50a 100644 --- a/clang/test/S
[clang] [compiler-rt] [libcxx] [flang] [llvm] [clang-tools-extra] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)
https://github.com/ecnelises updated https://github.com/llvm/llvm-project/pull/71696 >From 1d0109b7f370a3689a92e20ab52597b112669e47 Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Thu, 9 Nov 2023 00:00:26 +0800 Subject: [PATCH 1/3] [Clang][Sema] Fix qualifier restriction of overriden methods If return type of overriden method is pointer or reference to non-class type, qualifiers cannot be dropped. This also fixes check when qualifier of overriden method's class return type is not subset of super method's. --- .../clang/Basic/DiagnosticSemaKinds.td| 2 +- clang/lib/Sema/SemaDeclCXX.cpp| 15 +- clang/test/SemaCXX/virtual-override.cpp | 28 ++- 3 files changed, 42 insertions(+), 3 deletions(-) diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td index 18c2e861385e463..e60a7513d54e552 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -2115,7 +2115,7 @@ def err_covariant_return_type_different_qualifications : Error< def err_covariant_return_type_class_type_more_qualified : Error< "return type of virtual function %0 is not covariant with the return type of " "the function it overrides (class type %1 is more qualified than class " - "type %2">; + "type %2)">; // C++ implicit special member functions def note_in_declaration_of_implicit_special_member : Note< diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp index 60786a880b9d3fd..b2c1f1fff9d7e7b 100644 --- a/clang/lib/Sema/SemaDeclCXX.cpp +++ b/clang/lib/Sema/SemaDeclCXX.cpp @@ -18469,7 +18469,7 @@ bool Sema::CheckOverridingFunctionReturnType(const CXXMethodDecl *New, // The new class type must have the same or less qualifiers as the old type. - if (NewClassTy.isMoreQualifiedThan(OldClassTy)) { + if (!OldClassTy.isAtLeastAsQualifiedAs(NewClassTy)) { Diag(New->getLocation(), diag::err_covariant_return_type_class_type_more_qualified) << New->getDeclName() << NewTy << OldTy @@ -18479,6 +18479,19 @@ bool Sema::CheckOverridingFunctionReturnType(const CXXMethodDecl *New, return true; } + // Non-class return types should not drop qualifiers in overriden method. + if (!OldClassTy->isStructureOrClassType() && + OldClassTy.getLocalCVRQualifiers() != + NewClassTy.getLocalCVRQualifiers()) { +Diag(New->getLocation(), + diag::err_covariant_return_type_different_qualifications) +<< New->getDeclName() << NewTy << OldTy +<< New->getReturnTypeSourceRange(); +Diag(Old->getLocation(), diag::note_overridden_virtual_function) +<< Old->getReturnTypeSourceRange(); +return true; + } + return false; } diff --git a/clang/test/SemaCXX/virtual-override.cpp b/clang/test/SemaCXX/virtual-override.cpp index 72abfc3cf51e1f7..003f4826a3d6c86 100644 --- a/clang/test/SemaCXX/virtual-override.cpp +++ b/clang/test/SemaCXX/virtual-override.cpp @@ -87,7 +87,7 @@ class A { class B : A { virtual a* f(); - virtual const a* g(); // expected-error{{return type of virtual function 'g' is not covariant with the return type of the function it overrides (class type 'const a *' is more qualified than class type 'a *'}} + virtual const a* g(); // expected-error{{return type of virtual function 'g' is not covariant with the return type of the function it overrides (class type 'const a *' is more qualified than class type 'a *')}} }; } @@ -289,3 +289,29 @@ namespace PR8168 { static void foo() {} // expected-error{{'static' member function 'foo' overrides a virtual function}} }; } + +namespace T13 { + class A { + public: +virtual const int* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual int* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides ('int *' has different qualifiers than 'const int *')}} + }; +} + +namespace T14 { + struct a {}; + + class A { + public: +virtual const a* foo(); // expected-note{{overridden virtual function is here}} + }; + + class B: public A { + public: +virtual volatile a* foo(); // expected-error{{return type of virtual function 'foo' is not covariant with the return type of the function it overrides (class type 'volatile a *' is more qualified than class type 'const a *')}} + }; +} >From 5f64fec64b51542abd72a9a870ae9e5fe357d026 Mon Sep 17 00:00:00 2001 From: Qiu Chaofan Date: Thu, 9 Nov 2023 17:49:33 +0800 Subject: [PATCH 2/3] Say 'different qualifiers' instead of 'more qualified' --- clang/test/SemaCXX/virtual-override.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clang/test/SemaCXX/virtual-override.cpp b/clang/test/SemaCXX/virtual-override.cpp index 003f4826a3d6c86..3a10e15a663a50a 100644 --- a/clang/test/S
[clang-tools-extra] [llvm] [clang] [CodeGen] Revamp counted_by calculations (PR #70606)
bwendling wrote: @rapidsna My recent commits try to address a lot of the issues you brought up. If the FAM's array index is negative or out of bounds, it should now catch it and return an appropriate value. There may still be some corner cases that have to be hammered out, but I'd like to get this in if you feel it's ready, as I think the corner cases will occur infrequently. https://github.com/llvm/llvm-project/pull/70606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)
dtcxzyw wrote: Reduced test case: https://godbolt.org/z/d4ETPhbno https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)
https://github.com/OutOfCache updated https://github.com/llvm/llvm-project/pull/70669 >From 75db77fef715fa5aee10a8384fca299b7bf2b7a3 Mon Sep 17 00:00:00 2001 From: Jessica Del Date: Sun, 29 Oct 2023 21:16:52 +0100 Subject: [PATCH] [AMDGPU] - Add clang builtins for tied WMMA intrinsics Add clang builtins for the new tied wmma intrinsics. These variations tie the destination accumulator matrix to the input accumulator matrix. Add negative tests for gfx10, since we do not support the wmma intrinsics before gfx11. --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 4 +++ clang/lib/CodeGen/CGBuiltin.cpp | 14 .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 34 +++ .../CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl | 30 .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 34 +++ .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 30 6 files changed, 146 insertions(+) create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl create mode 100644 clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index 532a91fd903e87c..a19c8bd5f219ec6 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -292,6 +292,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, "V8fV16hV16hV8f", "nc TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, "V16hV16hV16hV16hIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, "V16sV16sV16sV16sIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, "V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, "V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts") @@ -299,6 +301,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, "V4fV16hV16hV4f", "nc TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, "V8hV16hV16hV8hIb", "nc", "gfx11-insts") +TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, "V8sV16sV16sV8sIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, "V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts") TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, "V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts") diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index d49c44dbaace3a8..f3c989a76cbc380 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -17936,9 +17936,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, } case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32: + case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32: case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64: + case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64: case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32: + case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32: case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w64: + case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64: case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32: case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64: case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_f16_w32: @@ -17976,6 +17980,16 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ArgForMatchingRetType = 2; BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16; break; +case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32: +case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64: + ArgForMatchingRetType = 2; + BuiltinWMMAOp = Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied; + break; +case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32: +case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64: + ArgForMatchingRetType = 2; + BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied; + break; case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32: case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64: ArgForMatchingRetType = 4; diff --git a/clang/test/CodeGenOpenCL/bui
[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)
nikic wrote: It looks like simplifyAssocCastAssoc() is the problematic transform. It modifies a zext in-place without clearing poison flags. https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [InstCombine] Infer zext nneg flag (PR #71534)
nikic wrote: Should be fixed by https://github.com/llvm/llvm-project/commit/1b1c81772fe50a1cb2b2adf8d8cf442c0b73602f. https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)
=?utf-8?q?Bal=C3=A1zs_K=C3=A9ri?= Message-ID: In-Reply-To: https://github.com/DonatNagyE approved this pull request. Thanks for adding the missing TC! https://github.com/llvm/llvm-project/pull/71373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement inc/dec for IntegralAP (PR #69597)
https://github.com/tbaederr updated https://github.com/llvm/llvm-project/pull/69597 >From be120871fa8486ce9dd6cabb0a0b27d8371896b8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timm=20B=C3=A4der?= Date: Wed, 18 Oct 2023 15:36:13 +0200 Subject: [PATCH] [clang][Interp] Implement inc/dec for IntegralAP --- clang/lib/AST/Interp/IntegralAP.h | 12 ++--- clang/test/AST/Interp/intap.cpp | 81 --- 2 files changed, 68 insertions(+), 25 deletions(-) diff --git a/clang/lib/AST/Interp/IntegralAP.h b/clang/lib/AST/Interp/IntegralAP.h index 88de1f1392e6813..82da79a55b05312 100644 --- a/clang/lib/AST/Interp/IntegralAP.h +++ b/clang/lib/AST/Interp/IntegralAP.h @@ -177,17 +177,13 @@ template class IntegralAP final { } static bool increment(IntegralAP A, IntegralAP *R) { -// FIXME: Implement. -assert(false); -*R = IntegralAP(A.V - 1); -return false; +IntegralAP One(1, A.bitWidth()); +return add(A, One, A.bitWidth() + 1, R); } static bool decrement(IntegralAP A, IntegralAP *R) { -// FIXME: Implement. -assert(false); -*R = IntegralAP(A.V - 1); -return false; +IntegralAP One(1, A.bitWidth()); +return sub(A, One, A.bitWidth() + 1, R); } static bool add(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp index 34c8d0565082994..73c795732ff1055 100644 --- a/clang/test/AST/Interp/intap.cpp +++ b/clang/test/AST/Interp/intap.cpp @@ -43,9 +43,25 @@ namespace APCast { } #ifdef __SIZEOF_INT128__ +typedef __int128 int128_t; +typedef unsigned __int128 uint128_t; +static const __uint128_t UINT128_MAX =__uint128_t(__int128_t(-1L)); +static_assert(UINT128_MAX == -1, ""); +static_assert(UINT128_MAX == 1, ""); // expected-error {{static assertion failed}} \ + // expected-note {{'340282366920938463463374607431768211455 == 1'}} \ + // ref-error {{static assertion failed}} \ + // ref-note {{'340282366920938463463374607431768211455 == 1'}} + +static const __int128_t INT128_MAX = UINT128_MAX >> (__int128_t)1; +static_assert(INT128_MAX != 0, ""); +static_assert(INT128_MAX == 0, ""); // expected-error {{failed}} \ +// expected-note {{evaluates to '170141183460469231731687303715884105727 == 0'}} \ +// ref-error {{failed}} \ +// ref-note {{evaluates to '170141183460469231731687303715884105727 == 0'}} +static const __int128_t INT128_MIN = -INT128_MAX - 1; + namespace i128 { - typedef __int128 int128_t; - typedef unsigned __int128 uint128_t; + constexpr int128_t I128_1 = 12; static_assert(I128_1 == 12, ""); static_assert(I128_1 != 10, ""); @@ -54,21 +70,6 @@ namespace i128 { // expected-note{{evaluates to}} \ // ref-note{{evaluates to}} - static const __uint128_t UINT128_MAX =__uint128_t(__int128_t(-1L)); - static_assert(UINT128_MAX == -1, ""); - static_assert(UINT128_MAX == 1, ""); // expected-error {{static assertion failed}} \ - // expected-note {{'340282366920938463463374607431768211455 == 1'}} \ - // ref-error {{static assertion failed}} \ - // ref-note {{'340282366920938463463374607431768211455 == 1'}} - - static const __int128_t INT128_MAX = UINT128_MAX >> (__int128_t)1; - static_assert(INT128_MAX != 0, ""); - static_assert(INT128_MAX == 0, ""); // expected-error {{failed}} \ - // expected-note {{evaluates to '170141183460469231731687303715884105727 == 0'}} \ - // ref-error {{failed}} \ - // ref-note {{evaluates to '170141183460469231731687303715884105727 == 0'}} - - static const __int128_t INT128_MIN = -INT128_MAX - 1; constexpr __int128 A = INT128_MAX + 1; // expected-error {{must be initialized by a constant expression}} \ // expected-note {{value 170141183460469231731687303715884105728 is outside the range}} \ // ref-error {{must be initialized by a constant expression}} \ @@ -157,4 +158,50 @@ namespace Bitfields { // expected-warning {{changes value from 100 to 0}} } +namespace IncDec { +#if __cplusplus >= 201402L + constexpr int128_t maxPlus1(bool Pre) { +int128_t a = INT128_MAX; + +if (Pre) + ++a; // ref-note {{value 170141183460469231731687303715884105728 is outside the range}} \ + // expected-note {{value 170141183460469231731687303715884105728 is outside the range}} +else + a++; // ref-note {{value 170141183460469231731687303715884105728 is outside the rang
[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)
https://github.com/MDevereau created https://github.com/llvm/llvm-project/pull/71795 Adds the builtins: void svldr_zt(uint64_t zt, const void *rn) void svstr_zt(uint64_t zt, void *rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) Patch by: Kerry McLaughlin >From 9846bc9efd79e6e3c2662ea42367c102df88799d Mon Sep 17 00:00:00 2001 From: Matt Devereau Date: Thu, 9 Nov 2023 10:50:05 + Subject: [PATCH] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics Adds the builtins: void svldr_zt(uint64_t zt, const void *rn) void svstr_zt(uint64_t zt, void *rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) --- clang/include/clang/Basic/arm_sme.td | 5 ++ clang/include/clang/Basic/arm_sve.td | 9 .../acle_sme2_ldr_str_zt.c| 51 +++ llvm/include/llvm/IR/IntrinsicsAArch64.td | 11 ++-- .../Target/AArch64/AArch64ISelDAGToDAG.cpp| 7 ++- .../Target/AArch64/AArch64ISelLowering.cpp| 21 llvm/lib/Target/AArch64/AArch64ISelLowering.h | 2 + .../Target/AArch64/AArch64RegisterInfo.cpp| 6 +++ .../lib/Target/AArch64/AArch64SMEInstrInfo.td | 4 +- llvm/lib/Target/AArch64/SMEInstrFormats.td| 23 +++-- .../CodeGen/AArch64/sme2-intrinsics-zt0.ll| 27 ++ 11 files changed, 153 insertions(+), 13 deletions(-) create mode 100644 clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c create mode 100644 llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll diff --git a/clang/include/clang/Basic/arm_sme.td b/clang/include/clang/Basic/arm_sme.td index b5655afdf419ecf..fe3de56ce3298c5 100644 --- a/clang/include/clang/Basic/arm_sme.td +++ b/clang/include/clang/Basic/arm_sme.td @@ -298,3 +298,8 @@ multiclass ZAAddSub { defm SVADD : ZAAddSub<"add">; defm SVSUB : ZAAddSub<"sub">; + +let TargetGuard = "sme2" in { + def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; + def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +} diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index 3d4c2129565903d..f0b3747898d4145 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", "b", MergeNone, "aarch64_ def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, "aarch64_sve_whilewr_h", [IsOverloadWhileRW]>; } +// // +// // Spill and fill of ZT0 +// // + +// let TargetGuard = "sme2" in { +// def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// } + // SVE2 - Extended table lookup/permute let TargetGuard = "sve2" in { diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c new file mode 100644 index 000..3d70ded6b469ba1 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + +// REQUIRES: aarch64-registered-target + +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s + +#include + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1 +#else +#define SVE_ACLE_FUNC(A
[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)
mikaelholmen wrote: > Should be fixed by > [1b1c817](https://github.com/llvm/llvm-project/commit/1b1c81772fe50a1cb2b2adf8d8cf442c0b73602f). I've confirmed that the instances of the problem that we saw are fixed by 1b1c81772fe50a. Thanks! https://github.com/llvm/llvm-project/pull/71534 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Matthew Devereau (MDevereau) Changes Adds the builtins: void svldr_zt(uint64_t zt, const void *rn) void svstr_zt(uint64_t zt, void *rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) Patch by: Kerry McLaughlin--- Full diff: https://github.com/llvm/llvm-project/pull/71795.diff 11 Files Affected: - (modified) clang/include/clang/Basic/arm_sme.td (+5) - (modified) clang/include/clang/Basic/arm_sve.td (+9) - (added) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c (+51) - (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7-4) - (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+5-2) - (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+21) - (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2) - (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+6) - (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+2-2) - (modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+18-5) - (added) llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll (+27) ``diff diff --git a/clang/include/clang/Basic/arm_sme.td b/clang/include/clang/Basic/arm_sme.td index b5655afdf419ecf..fe3de56ce3298c5 100644 --- a/clang/include/clang/Basic/arm_sme.td +++ b/clang/include/clang/Basic/arm_sme.td @@ -298,3 +298,8 @@ multiclass ZAAddSub { defm SVADD : ZAAddSub<"add">; defm SVSUB : ZAAddSub<"sub">; + +let TargetGuard = "sme2" in { + def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; + def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +} diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index 3d4c2129565903d..f0b3747898d4145 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", "b", MergeNone, "aarch64_ def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, "aarch64_sve_whilewr_h", [IsOverloadWhileRW]>; } +// // +// // Spill and fill of ZT0 +// // + +// let TargetGuard = "sme2" in { +// def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// } + // SVE2 - Extended table lookup/permute let TargetGuard = "sve2" in { diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c new file mode 100644 index 000..3d70ded6b469ba1 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + +// REQUIRES: aarch64-registered-target + +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s + +#include + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1 +#else +#define SVE_ACLE_FUNC(A1,A2) A1##A2 +#endif + +// LDR ZT0 + +// CHECK-LABEL: @test_svldr_zt( +// CHECK-NEXT: entry: +// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr [[BASE:%.*]]) +// CHECK-NEXT:ret void +// +// CPP-CHECK-LABEL: @_Z13test_svldr_ztPKv( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr [[BASE:%.*]]) +// CPP-CHECK-NEXT:ret void +// +void test
[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)
llvmbot wrote: @llvm/pr-subscribers-llvm-ir Author: Matthew Devereau (MDevereau) Changes Adds the builtins: void svldr_zt(uint64_t zt, const void *rn) void svstr_zt(uint64_t zt, void *rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) Patch by: Kerry McLaughlin--- Full diff: https://github.com/llvm/llvm-project/pull/71795.diff 11 Files Affected: - (modified) clang/include/clang/Basic/arm_sme.td (+5) - (modified) clang/include/clang/Basic/arm_sve.td (+9) - (added) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c (+51) - (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7-4) - (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+5-2) - (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+21) - (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2) - (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+6) - (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+2-2) - (modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+18-5) - (added) llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll (+27) ``diff diff --git a/clang/include/clang/Basic/arm_sme.td b/clang/include/clang/Basic/arm_sme.td index b5655afdf419ecf..fe3de56ce3298c5 100644 --- a/clang/include/clang/Basic/arm_sme.td +++ b/clang/include/clang/Basic/arm_sme.td @@ -298,3 +298,8 @@ multiclass ZAAddSub { defm SVADD : ZAAddSub<"add">; defm SVSUB : ZAAddSub<"sub">; + +let TargetGuard = "sme2" in { + def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; + def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +} diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index 3d4c2129565903d..f0b3747898d4145 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", "b", MergeNone, "aarch64_ def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, "aarch64_sve_whilewr_h", [IsOverloadWhileRW]>; } +// // +// // Spill and fill of ZT0 +// // + +// let TargetGuard = "sme2" in { +// def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// } + // SVE2 - Extended table lookup/permute let TargetGuard = "sve2" in { diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c new file mode 100644 index 000..3d70ded6b469ba1 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + +// REQUIRES: aarch64-registered-target + +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s + +#include + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1 +#else +#define SVE_ACLE_FUNC(A1,A2) A1##A2 +#endif + +// LDR ZT0 + +// CHECK-LABEL: @test_svldr_zt( +// CHECK-NEXT: entry: +// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr [[BASE:%.*]]) +// CHECK-NEXT:ret void +// +// CPP-CHECK-LABEL: @_Z13test_svldr_ztPKv( +// CPP-CHECK-NEXT: entry: +// CPP-CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr [[BASE:%.*]]) +// CPP-CHECK-NEXT:ret void +// +void te
[clang] [clang][Interp] Implement IntegralAP subtraction (PR #71648)
https://github.com/tbaederr updated https://github.com/llvm/llvm-project/pull/71648 >From f1421c190fd480a664bab80281db1e8abb1056a1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timm=20B=C3=A4der?= Date: Wed, 8 Nov 2023 06:49:41 +0100 Subject: [PATCH] [clang][Interp] Implement IntegralAP subtraction --- clang/lib/AST/Interp/IntegralAP.h | 32 --- clang/test/AST/Interp/intap.cpp | 15 +++ 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/clang/lib/AST/Interp/IntegralAP.h b/clang/lib/AST/Interp/IntegralAP.h index 88de1f1392e6813..b8e37878ce2f848 100644 --- a/clang/lib/AST/Interp/IntegralAP.h +++ b/clang/lib/AST/Interp/IntegralAP.h @@ -191,12 +191,11 @@ template class IntegralAP final { } static bool add(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -return CheckAddUB(A, B, OpBits, R); +return CheckAddSubUB(A, B, OpBits, R); } static bool sub(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -/// FIXME: Gotta check if the result fits into OpBits bits. -return CheckSubUB(A, B, R); +return CheckAddSubUB(A, B, OpBits, R); } static bool mul(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { @@ -264,28 +263,21 @@ template class IntegralAP final { } private: - static bool CheckAddUB(const IntegralAP &A, const IntegralAP &B, - unsigned BitWidth, IntegralAP *R) { -if (!A.isSigned()) { - R->V = A.V + B.V; + template class Op> + static bool CheckAddSubUB(const IntegralAP &A, const IntegralAP &B, +unsigned BitWidth, IntegralAP *R) { +if constexpr (!Signed) { + R->V = Op{}(A.V, B.V); return false; } -const APSInt &LHS = APSInt(A.V, A.isSigned()); -const APSInt &RHS = APSInt(B.V, B.isSigned()); - -APSInt Value(LHS.extend(BitWidth) + RHS.extend(BitWidth), false); +const APSInt &LHS = A.toAPSInt(); +const APSInt &RHS = B.toAPSInt(); +APSInt Value = Op{}(LHS.extend(BitWidth), RHS.extend(BitWidth)); APSInt Result = Value.trunc(LHS.getBitWidth()); -if (Result.extend(BitWidth) != Value) - return true; - R->V = Result; -return false; - } - static bool CheckSubUB(const IntegralAP &A, const IntegralAP &B, - IntegralAP *R) { -R->V = A.V - B.V; -return false; // Success! + +return Result.extend(BitWidth) != Value; } }; diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp index 34c8d0565082994..c3cae9a64780d5c 100644 --- a/clang/test/AST/Interp/intap.cpp +++ b/clang/test/AST/Interp/intap.cpp @@ -11,7 +11,12 @@ constexpr _BitInt(2) B = A + 1; constexpr _BitInt(2) C = B + 1; // expected-warning {{from 2 to -2}} \ // ref-warning {{from 2 to -2}} static_assert(C == -2, ""); +static_assert(C - B == A, ""); // expected-error {{not an integral constant expression}} \ + // expected-note {{value -3 is outside the range of representable values}} \ + // ref-error {{not an integral constant expression}} \ + // ref-note {{value -3 is outside the range of representable values}} +static_assert(B - 1 == 0, ""); constexpr MaxBitInt A_ = 0; constexpr MaxBitInt B_ = A_ + 1; @@ -130,6 +135,16 @@ namespace i128 { // expected-warning {{implicit conversion of out of range value}} \ // expected-error {{must be initialized by a constant expression}} \ // expected-note {{is outside the range of representable values of type}} + + constexpr uint128_t Zero = 0; + static_assert((Zero -1) == -1, ""); + constexpr int128_t Five = 5; + static_assert(Five - Zero == Five, ""); + + constexpr int128_t Sub1 = INT128_MIN - 1; // expected-error {{must be initialized by a constant expression}} \ +// expected-note {{-170141183460469231731687303715884105729 is outside the range}} \ +// ref-error {{must be initialized by a constant expression}} \ +// ref-note {{-170141183460469231731687303715884105729 is outside the range}} } namespace AddSubOffset { ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement IntegralAP subtraction (PR #71648)
tbaederr wrote: Tests should work now https://github.com/llvm/llvm-project/pull/71648 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement builtin_expect (PR #69713)
Timm =?utf-8?q?B=C3=A4der?= Message-ID: In-Reply-To: tbaederr wrote: Ping https://github.com/llvm/llvm-project/pull/69713 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jhuber6 wrote: If there were any global ctors / dtors the backend will emit a kernel. This is simply encoding "Does this symbol exist? If not continue on". We check the ELF symbol table directly as it's more efficient than going through the device API. We probably need to encode the logic better, since `consumeError` is a bit of a code smell. Maybe a helper function like `Handler.hasGlobal` or something. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); +} + +// Allocate and construct the AMDGPU kernel. +GenericKernelTy *AMDGPUKernel = Plugin.allocate(); +if (!AMDGPUKernel) + return Plugin::error("Failed to allocate memory for AMDGPU kernel"); + +new (AMDGPUKernel) AMDGPUKernelTy(Name); +if (auto Err = AMDGPUKernel->initImpl(*this, Image)) + return std::move(Err); + +auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>(); +AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr); + +if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper)) + return std::move(Err); + +KernelArgsTy KernelArgs = {}; +if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u, +/*NumBlocks=*/1ul, KernelArgs, +/*Args=*/nullptr, AsyncInfoWrapper)) + return std::move(Err); + +if (auto Err = synchronize(AsyncInfoPtr)) + return std::move(Err); +Error Err = Error::success(); jhuber6 wrote: Yes https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 0a1f4b5d514a5e1525e3178a80f6e8f5638bfb69 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 8 ++ .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 291 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emi
[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 18bb9725619569687bec2c013768511105266a5e 9846bc9efd79e6e3c2662ea42367c102df88799d -- clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp index c011a46cf02a..abfe14e52509 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp @@ -326,7 +326,8 @@ public: return false; } - template bool ImmToTile(SDValue N, SDValue &Imm) { + template + bool ImmToTile(SDValue N, SDValue &Imm) { if (auto *CI = dyn_cast(N)) { uint64_t C = CI->getZExtValue(); diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp index c6ff3f1ce6a3..7404e04b8ea2 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -2754,12 +2754,11 @@ MachineBasicBlock *AArch64TargetLowering::EmitZTSpillFill(MachineInstr &MI, if (IsSpill) { MIB = BuildMI(*BB, MI, MI.getDebugLoc(), TII->get(AArch64::STR_TX)); MIB.addReg(MI.getOperand(0).getReg()); - } - else + } else MIB = BuildMI(*BB, MI, MI.getDebugLoc(), TII->get(AArch64::LDR_TX), MI.getOperand(0).getReg()); MIB.add(MI.getOperand(1)); // Base - MI.eraseFromParent(); // The pseudo is gone now. + MI.eraseFromParent(); // The pseudo is gone now. return BB; } diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp index af2181c0791b..0b4dde5e4d19 100644 --- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp @@ -442,7 +442,7 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const { if (MF.getSubtarget().hasSME2()) { for (MCSubRegIterator SubReg(AArch64::ZT0, this, /*self=*/true); - SubReg.isValid(); ++SubReg) + SubReg.isValid(); ++SubReg) Reserved.set(*SubReg); } `` https://github.com/llvm/llvm-project/pull/71795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [flang] add fveclib flag (PR #71734)
@@ -81,6 +81,17 @@ class CodeGenOptions : public CodeGenOptionsBase { RK_WithPattern, // Remark pattern specified via '-Rgroup=regexp'. }; + enum class VectorLibrary { +NoLibrary, // Don't use any vector library. +Accelerate, // Use the Accelerate framework. +LIBMVEC,// GLIBC vector math library. +MASSV, // IBM MASS vector library. +SVML, // Intel short vector math library. +SLEEF, // SLEEF SIMD Library for Evaluating Elementary Functions. +Darwin_libsystem_m, // Use Darwin's libsystem_m vector functions. +ArmPL // Arm Performance Libraries. + }; kiranchandramohan wrote: Can this class be moved to a file in a new directory `llvm/include/llvm/Frontend/Driver` and shared with Clang? https://github.com/llvm/llvm-project/pull/71734 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [flang] add fveclib flag (PR #71734)
@@ -843,6 +843,44 @@ getOutputStream(CompilerInstance &ci, llvm::StringRef inFile, llvm_unreachable("Invalid action!"); } +static std::unique_ptr +createTLII(llvm::Triple &targetTriple, const CodeGenOptions &codeGenOpts) { + auto tlii = std::make_unique(targetTriple); + assert(tlii && "Failed to create TargetLibraryInfo"); + + using VecLib = llvm::TargetLibraryInfoImpl::VectorLibrary; + VecLib vecLib = VecLib::NoLibrary; + switch (codeGenOpts.getVecLib()) { + case CodeGenOptions::VectorLibrary::Accelerate: +vecLib = VecLib::Accelerate; +break; + case CodeGenOptions::VectorLibrary::LIBMVEC: +vecLib = VecLib::LIBMVEC_X86; +break; + case CodeGenOptions::VectorLibrary::MASSV: +vecLib = VecLib::MASSV; +break; + case CodeGenOptions::VectorLibrary::SVML: +vecLib = VecLib::SVML; +break; + case CodeGenOptions::VectorLibrary::SLEEF: +vecLib = VecLib::SLEEFGNUABI; +break; + case CodeGenOptions::VectorLibrary::Darwin_libsystem_m: +vecLib = VecLib::DarwinLibSystemM; +break; + case CodeGenOptions::VectorLibrary::ArmPL: +vecLib = VecLib::ArmPL; +break; + case CodeGenOptions::VectorLibrary::NoLibrary: +vecLib = VecLib::NoLibrary; +break; + } + + tlii->addVectorizableFunctionsFromVecLib(vecLib, targetTriple); + return tlii; +} kiranchandramohan wrote: Can this code be moved to `llvm/lib/Frontend/Driver` and shared with Clang? https://github.com/llvm/llvm-project/pull/71734 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jplehr commented: Thanks Joseph. Another two nits. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); + } + + /// Invokes any global destructors on the device if present and is required + /// by the target. + virtual Error callGlobalDestructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jplehr wrote: Plugin::success() https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jplehr wrote: Plugin::success() https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { using AMDGPUEventRef = AMDGPUResourceRef; using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy; + /// Common method to invoke a single threaded constructor or destructor + /// kernel by name. + Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image, + const char *Name) { +// Perform a quick check for the named kernel in the image. The kernel +// should be created by the 'amdgpu-lower-ctor-dtor' pass. +GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler(); +GlobalTy Global(Name, sizeof(void *)); +if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) { + consumeError(std::move(Err)); + return Error::success(); jplehr wrote: That would certainly make it more obvious. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement __builtin_bit_cast (PR #68288)
Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= , Timm =?utf-8?q?Bäder?= Message-ID: In-Reply-To: @@ -0,0 +1,816 @@ +// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only -fexperimental-new-constant-interpreter %s +// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only %s +// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only -triple aarch64_be-linux-gnu -fexperimental-new-constant-interpreter %s +// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple aarch64_be-linux-gnu %s +// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only -fexperimental-new-constant-interpreter -triple powerpc64le-unknown-unknown -mabi=ieeelongdouble %s +// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple powerpc64le-unknown-unknown -mabi=ieeelongdouble %s +// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only -fexperimental-new-constant-interpreter -triple powerpc64-unknown-unknown -mabi=ieeelongdouble %s +// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple powerpc64-unknown-unknown -mabi=ieeelongdouble %s + +/// FIXME: This is a version of +/// clang/test/SemaCXX/constexpr-builtin-bit-cast.cpp with the currently +/// supported subset of operations. They should *all* be supported though. + + +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ +# define LITTLE_END 1 +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +# define LITTLE_END 0 +#else +# error "huh?" +#endif + +typedef decltype(nullptr) nullptr_t; + + + +static_assert(sizeof(int) == 4); +static_assert(sizeof(long long) == 8); + +template +constexpr To bit_cast(const From &from) { + static_assert(sizeof(To) == sizeof(From)); + return __builtin_bit_cast(To, from); // ref-note 2{{indeterminate value can only initialize}} \ + // expected-note 2{{indeterminate value can only initialize}} \ + // ref-note {{subexpression not valid}} +} + + +/// Current interpreter does not support this. +/// https://github.com/llvm/llvm-project/issues/63686 +constexpr int FromString = bit_cast("abc"); // ref-error {{must be initialized by a constant expression}} \ + // ref-note {{in call to}} \ + // ref-note {{declared here}} +#if LITTLE_END +static_assert(FromString == 6513249); // ref-error {{is not an integral constant expression}} \ + // ref-note {{initializer of 'FromString' is not a constant expression}} +#else +static_assert(FromString == 1633837824); // ref-error {{is not an integral constant expression}} \ tbaederr wrote: TIL that `constinit` variables aren't usable in constant expressions. But otherwise the test works. https://github.com/llvm/llvm-project/pull/68288 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy { Error synchronize(__tgt_async_info *AsyncInfo); virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0; + /// Invokes any global constructors on the device if present and is required + /// by the target. + virtual Error callGlobalConstructors(GenericPluginTy &Plugin, + DeviceImageTy &Image) { +return Error::success(); jhuber6 wrote: This code is in the header above the definition of the `Plugin` class, so we can't use that without a complete reordering. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) arsenm wrote: Oh look, it's both of my favorite patterns. Can you refine this into something better than language X | language Y and AMDGPU || PTX https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: Yeah, these types of things are problematic especially if we consider getting SPIR-V support eventually. The logic basically goes like this. OpenMP supports global destructors but does not always support the `atexit` function. The old logic used to replace everything. This now at least lets CPU based targets use regular handling. I could make this unconditional for OpenMP, but I figured it'd be better to allow the CPU based targets to use the regular handling. More or less this is just a concession to prevent regressions from this patch. The old logic looked like this, which did this unconditionally. Like I said, could remove the AMD and PTX checks and just do this on the CPU as well if it would be better. ```c++ if (CGM.getLangOpts().OMPTargetTriples.empty() && !CGM.getLangOpts().OpenMPIsTargetDevice) return false; ``` https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From 5283c5e08877b11a0eece51ca3877c9f5f8c7b82 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + 18 files changed, 290 insertions(+), 216 deletions(-) diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp +++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp @@ -1747,136 +1747,6 @@ llvm::Function *CGOpenMPRuntime::emit
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: Just make this apply to all triples. I don't want to remove the dependency on the OpenMP language because this is somewhat of a hack. We can revisit this later if needed. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[compiler-rt] [llvm] [clang-tools-extra] [clang] [InferAddressSpaces] Fix constant replace to avoid modifying other functions (PR #70611)
https://github.com/arsenm approved this pull request. I think it would be better if we could eliminate ConstantExpr addrspacecasts from the IR altogether, which would avoid most of the complexity here. I would also somewhat prefer to push this DFS into a helper function, but can live with it inline as-is https://github.com/llvm/llvm-project/pull/70611 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)
https://github.com/mihailo-stojanovic created https://github.com/llvm/llvm-project/pull/71803 Fix the issue where Baremetal toolchain is created instead of the RISCVToolchain when GCC installation is explicitly passed via the gcc-install-dir option. >From cd5e6d82eb0eb0431f38c48a800c1951d8d4b343 Mon Sep 17 00:00:00 2001 From: Mihailo Stojanovic Date: Tue, 19 Sep 2023 14:30:00 +0300 Subject: [PATCH] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets Fix the issue where Baremetal toolchain is created instead of the RISCVToolchain when GCC installation is explicitly passed via the gcc-install-dir option. --- clang/lib/Driver/ToolChains/RISCVToolchain.cpp | 3 +++ .../riscv64-unknown-elf/include/c++/8.2.0/.keep | 0 .../include/c++/8.2.0/backward/.keep | 0 clang/test/Driver/gcc-install-dir.cpp| 12 4 files changed, 15 insertions(+) create mode 100644 clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep create mode 100644 clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp index 7e6abd144428783..6b27ea224eb02ee 100644 --- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp +++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp @@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D, if (Args.getLastArg(options::OPT_gcc_toolchain)) return true; + if (Args.getLastArg(options::OPT_gcc_install_dir_EQ)) +return true; + SmallString<128> GCCDir; llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(), "lib/crt0.o"); diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/gcc-install-dir.cpp b/clang/test/Driver/gcc-install-dir.cpp index 955f162a2ce3a19..d22ca545508370d 100644 --- a/clang/test/Driver/gcc-install-dir.cpp +++ b/clang/test/Driver/gcc-install-dir.cpp @@ -37,6 +37,18 @@ // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32" // DEBIAN_X86_64_M32-SAME: {{^}} "-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32" +/// Test GCC installation on bare-metal RISCV64. +// RUN: %clang -### %s --target=riscv64-unknown-elf --sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ --stdlib=platform --rtlib=platform \ +// RUN: --gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/ 2>&1 \ +// RUN: | FileCheck %s --check-prefix=ELF_RISCV64 +// ELF_RISCV64: "-internal-isystem" +// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/backward" +// ELF_RISCV64: "-L +// ELF_RISCV64-SAME: {{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64" + // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu --sysroot=%S/Inputs/debian_multiarch_tree \ // RUN: -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin -resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \ // RUN: --gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 2>&1 | FileCheck %s --check-prefix=INVALID ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)
llvmbot wrote: @llvm/pr-subscribers-backend-risc-v Author: None (mihailo-stojanovic) Changes Fix the issue where Baremetal toolchain is created instead of the RISCVToolchain when GCC installation is explicitly passed via the gcc-install-dir option. --- Full diff: https://github.com/llvm/llvm-project/pull/71803.diff 4 Files Affected: - (modified) clang/lib/Driver/ToolChains/RISCVToolchain.cpp (+3) - (added) clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep () - (added) clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep () - (modified) clang/test/Driver/gcc-install-dir.cpp (+12) ``diff diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp index 7e6abd144428783..6b27ea224eb02ee 100644 --- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp +++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp @@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D, if (Args.getLastArg(options::OPT_gcc_toolchain)) return true; + if (Args.getLastArg(options::OPT_gcc_install_dir_EQ)) +return true; + SmallString<128> GCCDir; llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(), "lib/crt0.o"); diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/gcc-install-dir.cpp b/clang/test/Driver/gcc-install-dir.cpp index 955f162a2ce3a19..d22ca545508370d 100644 --- a/clang/test/Driver/gcc-install-dir.cpp +++ b/clang/test/Driver/gcc-install-dir.cpp @@ -37,6 +37,18 @@ // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32" // DEBIAN_X86_64_M32-SAME: {{^}} "-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32" +/// Test GCC installation on bare-metal RISCV64. +// RUN: %clang -### %s --target=riscv64-unknown-elf --sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ --stdlib=platform --rtlib=platform \ +// RUN: --gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/ 2>&1 \ +// RUN: | FileCheck %s --check-prefix=ELF_RISCV64 +// ELF_RISCV64: "-internal-isystem" +// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/backward" +// ELF_RISCV64: "-L +// ELF_RISCV64-SAME: {{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64" + // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu --sysroot=%S/Inputs/debian_multiarch_tree \ // RUN: -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin -resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \ // RUN: --gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 2>&1 | FileCheck %s --check-prefix=INVALID `` https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)
@@ -113,7 +120,7 @@ class EmitAssemblyHelper { const CodeGenOptions &CodeGenOpts; const clang::TargetOptions &TargetOpts; const LangOptions &LangOpts; - Module *TheModule; + llvm::Module *TheModule; arsenm wrote: Why did this suddenly need qualification? https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)
@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses; static cl::opt ClSanitizeOnOptimizerEarlyEP( "sanitizer-early-opt-ep", cl::Optional, cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false)); + +// Re-link builtin bitcodes after optimization +static cl::opt ClRelinkBuiltinBitcodePostop( +"relink-builtin-bitcode-postop", cl::Optional, +cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false)); arsenm wrote: Not a proper flag? Where/how is -mlink-builtin-bitcode defined? https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { arsenm wrote: I think it makes it harder to reason about the pass as a whole to have it as a totally separate phase https://github.com/llvm/llvm-project/pull/68932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)
@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses; static cl::opt ClSanitizeOnOptimizerEarlyEP( "sanitizer-early-opt-ep", cl::Optional, cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false)); + +// Re-link builtin bitcodes after optimization +static cl::opt ClRelinkBuiltinBitcodePostop( +"relink-builtin-bitcode-postop", cl::Optional, +cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false)); jhuber6 wrote: That's a clang flag, this is presumably more of an LLVM one because this added a new pass that lives in Clang. I still think the solution to this was to just stop the backend from doing this optimization if it will obviously break it, but supposedly that caused performance regressions. https://github.com/llvm/llvm-project/pull/69371 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)
@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML, return HasVMemLoad && UsesVgprLoadedOutside; } +bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) { + bool Modified = false; + + for (auto &MBB : MF) { arsenm wrote: Plus I think the two separate, but closely related cl::opts is confusing https://github.com/llvm/llvm-project/pull/68932 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)
https://github.com/mihailo-stojanovic closed https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)
https://github.com/mihailo-stojanovic reopened https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)
https://github.com/mihailo-stojanovic updated https://github.com/llvm/llvm-project/pull/71803 >From 3c73fdf962c2e4fc8d993a34595f21a3926710d0 Mon Sep 17 00:00:00 2001 From: Mihailo Stojanovic Date: Tue, 19 Sep 2023 14:30:00 +0300 Subject: [PATCH] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains Fix the issue where Baremetal toolchain is created instead of the RISCVToolchain when GCC installation is explicitly passed via the gcc-install-dir option. --- clang/lib/Driver/ToolChains/RISCVToolchain.cpp | 3 +++ .../riscv64-unknown-elf/include/c++/8.2.0/.keep | 0 .../include/c++/8.2.0/backward/.keep | 0 clang/test/Driver/gcc-install-dir.cpp| 12 4 files changed, 15 insertions(+) create mode 100644 clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep create mode 100644 clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp index 7e6abd144428783..6b27ea224eb02ee 100644 --- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp +++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp @@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D, if (Args.getLastArg(options::OPT_gcc_toolchain)) return true; + if (Args.getLastArg(options::OPT_gcc_install_dir_EQ)) +return true; + SmallString<128> GCCDir; llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(), "lib/crt0.o"); diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep new file mode 100644 index 000..e69de29bb2d1d64 diff --git a/clang/test/Driver/gcc-install-dir.cpp b/clang/test/Driver/gcc-install-dir.cpp index 955f162a2ce3a19..d22ca545508370d 100644 --- a/clang/test/Driver/gcc-install-dir.cpp +++ b/clang/test/Driver/gcc-install-dir.cpp @@ -37,6 +37,18 @@ // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32" // DEBIAN_X86_64_M32-SAME: {{^}} "-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32" +/// Test GCC installation on bare-metal RISCV64. +// RUN: %clang -### %s --target=riscv64-unknown-elf --sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ --stdlib=platform --rtlib=platform \ +// RUN: --gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/ 2>&1 \ +// RUN: | FileCheck %s --check-prefix=ELF_RISCV64 +// ELF_RISCV64: "-internal-isystem" +// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-internal-isystem" "[[SYSROOT]]/include/c++/8.2.0/backward" +// ELF_RISCV64: "-L +// ELF_RISCV64-SAME: {{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64" +// ELF_RISCV64-SAME: {{^}} "-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64" + // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu --sysroot=%S/Inputs/debian_multiarch_tree \ // RUN: -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin -resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \ // RUN: --gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 2>&1 | FileCheck %s --check-prefix=INVALID ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)
https://github.com/mihailo-stojanovic edited https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
jhuber6 wrote: Just noticed I'm actually calling the destructors backwards in AMDGPU. Will fix that. https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) arsenm wrote: Would also just hide this in a target/lang predicate that lists these https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)
https://github.com/kito-cheng approved this pull request. Checked with `Generic_GCC::GCCInstallationDetector::init` to make sure clang will use that to search gcc toolchain, so LGTM. https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction &CGF, const VarDecl &D, if (D.isNoDestroy(CGM.getContext())) return; + // OpenMP offloading supports C++ constructors and destructors but we do not + // always have 'atexit' available. Instead lower these to use the LLVM global + // destructors which we can handle directly in the runtime. + if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice && + !D.isStaticLocal() && + (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX())) jhuber6 wrote: So just some random helper function like "Does target support X?" https://github.com/llvm/llvm-project/pull/71739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
https://github.com/CarolineConcatto edited https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in { def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], MemEltTyDefault, "aarch64_sve_stnt1">; } +let TargetGuard = "sve2p1" in { + // Contiguous truncating store from quadword (single vector). + def SVST1UWQ : MInst<"svst1uwq[_{d}]", "vPcd", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + + def SVST1UDQ : MInst<"svst1udq[_{d}]", "vPcd", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + + // Store one vector (vector base + scalar offset) + def SVST1Q_SCATTER_U64BASE_OFFSET : MInst<"svst1q_scatter[_{2}base]_offset[_{d}]", "vPgld", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]", "vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + + // Store one vector (scalar base + vector offset) + def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", "vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_vector_offset">; CarolineConcatto wrote: s/svst1q_scatter_[{3}]offset[_{0}]/svst1q_scatter_[{3}]offset[_{d}]/ https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -298,6 +298,38 @@ let TargetGuard = "sve,bf16" in { def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>; } +let TargetGuard = "sve2p1" in { + // Contiguous zero-extending load to quadword (single vector). + def SVLD1UWQ : MInst<"svld1uwq[_{d}]", "dPc", "iUif", [IsLoad], MemEltTyInt32, "aarch64_sve_ld1uwq">; + def SVLD1UWQ_VNUM : MInst<"svld1uwq_vnum[_{d}]", "dPcl", "iUif", [IsLoad], MemEltTyInt32, "aarch64_sve_ld1uwq">; + + def SVLD1UDQ : MInst<"svld1udq[_{d}]", "dPc", "lUld", [IsLoad], MemEltTyInt64, "aarch64_sve_ld1udq">; + def SVLD1UDQ_VNUM : MInst<"svld1udq_vnum[_{d}]", "dPcl", "lUld", [IsLoad], MemEltTyInt64, "aarch64_sve_ld1udq">; + + // Load one vector (vector base + scalar offset) + def SVLD1Q_GATHER_U64BASE_OFFSET : MInst<"svld1q_gather[_{2}base]_offset_{d}", "dPgl", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_scalar_offset">; + def SVLD1Q_GATHER_U64BASE : MInst<"svld1q_gather[_{2}base]_{d}", "dPg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_scalar_offset">; + + // Load one vector (scalar base + vector offset) + def SVLD1Q_GATHER_U64OFFSET : MInst<"svld1q_gather_[{3}]offset[_{0}]", "dPcg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_vector_offset">; + + // Load N-element structure into N vectors (scalar base) + defm SVLD2Q : StructLoad<"svld2q[_{2}]", "2Pc", "aarch64_sve_ld2q_sret">; + defm SVLD3Q : StructLoad<"svld3q[_{2}]", "3Pc", "aarch64_sve_ld3q_sret">; + defm SVLD4Q : StructLoad<"svld4q[_{2}]", "4Pc", "aarch64_sve_ld4q_sret">; + + // Load N-element structure into N vectors (scalar base, VL displacement) + defm SVLD2Q_VNUM : StructLoad<"svld2q_vnum[_{2}]", "2Pcl", "aarch64_sve_ld2q_sret">; + defm SVLD3Q_VNUM : StructLoad<"svld3q_vnum[_{2}]", "3Pcl", "aarch64_sve_ld3q_sret">; + defm SVLD4Q_VNUM : StructLoad<"svld4q_vnum[_{2}]", "4Pcl", "aarch64_sve_ld4q_sret">; + + // Load quadwords (scalar base + vector index) + def SVLD1Q_GATHER_INDICES_U : MInst<"svld1q_gather_[{3}]index[_{0}]", "dPcg", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, "aarch64_sve_ld1q_gather_index">; + + // Load quadwords (vector base + scalar index) + def SVLD1Q_GATHER_INDEX_S : MInst<"svld1q_gather[_{2}base]_index_{0}", "dPgl", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, "aarch64_sve_ld1q_gather_scalar_offset">; CarolineConcatto wrote: s/svld1q_gather[_{2}base]_index_{0}/svld1q_gather[_{2}base]_index_{d} https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -9497,8 +9500,11 @@ Value *CodeGenFunction::EmitSVEScatterStore(const SVETypeFlags &TypeFlags, // mapped to . However, this might be incompatible with the // actual type being stored. For example, when storing doubles (i64) the // predicated should be instead. At the IR level the type of - // the predicate and the data being stored must match. Cast accordingly. - Ops[1] = EmitSVEPredicateCast(Ops[1], OverloadedTy); + // the predicate and the data being stored must match. Cast to the type + // expected by the intrinsic. The intrinsic itself should be defined in + // a way that enforces relations between parameter types. + Ops[1] = EmitSVEPredicateCast( + Ops[1], cast(F->getArg(1)->getType())); CarolineConcatto wrote: Is this correct? F->getArg(1), is the predicated type, no? Arg[0] = void, Arg[1]= predicate AFAIU we did not shifted the Function arguments. When we do this: Ops.insert(Ops.begin(), Ops.pop_back_val());, does this also shifts F->getArg? https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -298,6 +298,38 @@ let TargetGuard = "sve,bf16" in { def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>; } +let TargetGuard = "sve2p1" in { + // Contiguous zero-extending load to quadword (single vector). + def SVLD1UWQ : MInst<"svld1uwq[_{d}]", "dPc", "iUif", [IsLoad], MemEltTyInt32, "aarch64_sve_ld1uwq">; + def SVLD1UWQ_VNUM : MInst<"svld1uwq_vnum[_{d}]", "dPcl", "iUif", [IsLoad], MemEltTyInt32, "aarch64_sve_ld1uwq">; + + def SVLD1UDQ : MInst<"svld1udq[_{d}]", "dPc", "lUld", [IsLoad], MemEltTyInt64, "aarch64_sve_ld1udq">; + def SVLD1UDQ_VNUM : MInst<"svld1udq_vnum[_{d}]", "dPcl", "lUld", [IsLoad], MemEltTyInt64, "aarch64_sve_ld1udq">; + + // Load one vector (vector base + scalar offset) + def SVLD1Q_GATHER_U64BASE_OFFSET : MInst<"svld1q_gather[_{2}base]_offset_{d}", "dPgl", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_scalar_offset">; + def SVLD1Q_GATHER_U64BASE : MInst<"svld1q_gather[_{2}base]_{d}", "dPg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_scalar_offset">; + + // Load one vector (scalar base + vector offset) + def SVLD1Q_GATHER_U64OFFSET : MInst<"svld1q_gather_[{3}]offset[_{0}]", "dPcg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, "aarch64_sve_ld1q_gather_vector_offset">; + + // Load N-element structure into N vectors (scalar base) + defm SVLD2Q : StructLoad<"svld2q[_{2}]", "2Pc", "aarch64_sve_ld2q_sret">; + defm SVLD3Q : StructLoad<"svld3q[_{2}]", "3Pc", "aarch64_sve_ld3q_sret">; + defm SVLD4Q : StructLoad<"svld4q[_{2}]", "4Pc", "aarch64_sve_ld4q_sret">; + + // Load N-element structure into N vectors (scalar base, VL displacement) + defm SVLD2Q_VNUM : StructLoad<"svld2q_vnum[_{2}]", "2Pcl", "aarch64_sve_ld2q_sret">; + defm SVLD3Q_VNUM : StructLoad<"svld3q_vnum[_{2}]", "3Pcl", "aarch64_sve_ld3q_sret">; + defm SVLD4Q_VNUM : StructLoad<"svld4q_vnum[_{2}]", "4Pcl", "aarch64_sve_ld4q_sret">; + + // Load quadwords (scalar base + vector index) + def SVLD1Q_GATHER_INDICES_U : MInst<"svld1q_gather_[{3}]index[_{0}]", "dPcg", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, "aarch64_sve_ld1q_gather_index">; CarolineConcatto wrote: nit: remove the extra space before "dPcg" Just in case, here we could also write as: svld1q_gather_[{3}]index[_{d}], both are correct because position 0 is 'd' in "dPcg", that is [default](d: default) But my opinion would be to replace what you have and do: s/svld1q_gather_[{3}]index[_{0}]/svld1q_gather_[{3}]index[_{d}]/g And do the same for SVLD1Q_GATHER_INDEX_S https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
https://github.com/CarolineConcatto commented: Hey Momchil, Thank you for the work. I left some comments. I did not finish it all. I still need to check the stores. But I will wait for the answers in the load, so I can keep checking the store. https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in { def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], MemEltTyDefault, "aarch64_sve_stnt1">; } +let TargetGuard = "sve2p1" in { + // Contiguous truncating store from quadword (single vector). + def SVST1UWQ : MInst<"svst1uwq[_{d}]", "vPcd", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + + def SVST1UDQ : MInst<"svst1udq[_{d}]", "vPcd", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + + // Store one vector (vector base + scalar offset) + def SVST1Q_SCATTER_U64BASE_OFFSET : MInst<"svst1q_scatter[_{2}base]_offset[_{d}]", "vPgld", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]", "vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + + // Store one vector (scalar base + vector offset) + def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", "vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_vector_offset">; + + // Store N vectors into N-element structure (scalar base) + defm SVST2Q : StructStore<"svst2q[_{d}]", "vPc2", "aarch64_sve_st2q">; + defm SVST3Q : StructStore<"svst3q[_{d}]", "vPc3", "aarch64_sve_st3q">; + defm SVST4Q : StructStore<"svst4q[_{d}]", "vPc4", "aarch64_sve_st4q">; + + // Store N vectors into N-element structure (scalar base, VL displacement) + defm SVST2Q_VNUM : StructStore<"svst2q_vnum[_{d}]", "vPcl2", "aarch64_sve_st2q">; + defm SVST3Q_VNUM : StructStore<"svst3q_vnum[_{d}]", "vPcl3", "aarch64_sve_st3q">; + defm SVST4Q_VNUM : StructStore<"svst4q_vnum[_{d}]", "vPcl4", "aarch64_sve_st4q">; + + // Scatter store quadwords (scalar base + vector index) + def SVST1Q_SCATTER_INDICES_U : MInst<"svst1q_scatter_[{3}]index[_{0}]", "vPpgd", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, "aarch64_sve_st1q_scatter_index">; + + // Scatter store quadwords (vector base + scalar index) + def SVST1Q_SCATTER_INDEX_S : MInst<"svst1q_scatter[_{2}base]_index[_{0}]", "vPgld", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; CarolineConcatto wrote: s/svst1q_scatter[_{2}base]_index[_{0}]/svst1q_scatter[_{2}base]_index[_{d}]/ https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -1457,6 +1457,24 @@ class AdvSIMD_GatherLoad_VS_Intrinsic ], [IntrReadMem]>; +class AdvSIMD_GatherLoadQ_VS_Intrinsic +: DefaultAttrsIntrinsic<[llvm_anyvector_ty], +[ + llvm_nxv1i1_ty, + llvm_anyvector_ty, CarolineConcatto wrote: So, why do we have the predicated vector as llvm_nxv1i1_ty? I was exception something like LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, because I don't see any cast for the predicate under EmitSVEGatherLoad. This line Ops[0] = EmitSVEPredicateCast( Ops[0], cast(F->getArg(0)->getType())); would map to whatever is the type in the position 0. Second, does it works if we replace the second llvm_anyvector_ty by llvm_nxv2i64_ty? I do think the vector will always be 64 bits https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in { def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], MemEltTyDefault, "aarch64_sve_stnt1">; } +let TargetGuard = "sve2p1" in { + // Contiguous truncating store from quadword (single vector). + def SVST1UWQ : MInst<"svst1uwq[_{d}]", "vPcd", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], MemEltTyInt32, "aarch64_sve_st1uwq">; + + def SVST1UDQ : MInst<"svst1udq[_{d}]", "vPcd", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], MemEltTyInt64, "aarch64_sve_st1udq">; + + // Store one vector (vector base + scalar offset) + def SVST1Q_SCATTER_U64BASE_OFFSET : MInst<"svst1q_scatter[_{2}base]_offset[_{d}]", "vPgld", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]", "vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_scalar_offset">; + + // Store one vector (scalar base + vector offset) + def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", "vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, "aarch64_sve_st1q_scatter_vector_offset">; + + // Store N vectors into N-element structure (scalar base) + defm SVST2Q : StructStore<"svst2q[_{d}]", "vPc2", "aarch64_sve_st2q">; + defm SVST3Q : StructStore<"svst3q[_{d}]", "vPc3", "aarch64_sve_st3q">; + defm SVST4Q : StructStore<"svst4q[_{d}]", "vPc4", "aarch64_sve_st4q">; + + // Store N vectors into N-element structure (scalar base, VL displacement) + defm SVST2Q_VNUM : StructStore<"svst2q_vnum[_{d}]", "vPcl2", "aarch64_sve_st2q">; + defm SVST3Q_VNUM : StructStore<"svst3q_vnum[_{d}]", "vPcl3", "aarch64_sve_st3q">; + defm SVST4Q_VNUM : StructStore<"svst4q_vnum[_{d}]", "vPcl4", "aarch64_sve_st4q">; + + // Scatter store quadwords (scalar base + vector index) + def SVST1Q_SCATTER_INDICES_U : MInst<"svst1q_scatter_[{3}]index[_{0}]", "vPpgd", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, "aarch64_sve_st1q_scatter_index">; CarolineConcatto wrote: s/svst1q_scatter_[{3}]offset[_{0}]/svst1q_scatter_[{3}]offset[_{d}]/ you could also write: svst1q_scatter_[{3}]offset[_{4}], but I rather write as d, because it does not depends on the position of the parameter. https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement bitwise operations for IntegralAP (PR #71807)
https://github.com/tbaederr created https://github.com/llvm/llvm-project/pull/71807 None >From 4d13e7b92c5d6bf08554a2e251ba65b8f433fb87 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timm=20B=C3=A4der?= Date: Thu, 9 Nov 2023 14:29:51 +0100 Subject: [PATCH] [clang][Interp] Implement bitwise operations for IntegralAP --- clang/lib/AST/Interp/IntegralAP.h | 8 +++- clang/test/AST/Interp/intap.cpp | 9 + 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/clang/lib/AST/Interp/IntegralAP.h b/clang/lib/AST/Interp/IntegralAP.h index 88de1f1392e6813..c8850a4bbb574aa 100644 --- a/clang/lib/AST/Interp/IntegralAP.h +++ b/clang/lib/AST/Interp/IntegralAP.h @@ -219,21 +219,19 @@ template class IntegralAP final { static bool bitAnd(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -// FIXME: Implement. -assert(false); +*R = IntegralAP(A.V & B.V); return false; } static bool bitOr(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -assert(false); +*R = IntegralAP(A.V | B.V); return false; } static bool bitXor(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -// FIXME: Implement. -assert(false); +*R = IntegralAP(A.V ^ B.V); return false; } diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp index 34c8d0565082994..a8893c8cb4eb9b8 100644 --- a/clang/test/AST/Interp/intap.cpp +++ b/clang/test/AST/Interp/intap.cpp @@ -157,4 +157,13 @@ namespace Bitfields { // expected-warning {{changes value from 100 to 0}} } +namespace BitOps { + constexpr unsigned __int128 UZero = 0; + constexpr unsigned __int128 Max = ~UZero; + static_assert(Max == ~0, ""); + static_assert((Max & 0) == 0, ""); + static_assert((UZero | 0) == 0, ""); + static_assert((Max ^ Max) == 0, ""); +} + #endif ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang][Interp] Implement bitwise operations for IntegralAP (PR #71807)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Timm Baeder (tbaederr) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/71807.diff 2 Files Affected: - (modified) clang/lib/AST/Interp/IntegralAP.h (+3-5) - (modified) clang/test/AST/Interp/intap.cpp (+9) ``diff diff --git a/clang/lib/AST/Interp/IntegralAP.h b/clang/lib/AST/Interp/IntegralAP.h index 88de1f1392e6813..c8850a4bbb574aa 100644 --- a/clang/lib/AST/Interp/IntegralAP.h +++ b/clang/lib/AST/Interp/IntegralAP.h @@ -219,21 +219,19 @@ template class IntegralAP final { static bool bitAnd(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -// FIXME: Implement. -assert(false); +*R = IntegralAP(A.V & B.V); return false; } static bool bitOr(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -assert(false); +*R = IntegralAP(A.V | B.V); return false; } static bool bitXor(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) { -// FIXME: Implement. -assert(false); +*R = IntegralAP(A.V ^ B.V); return false; } diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp index 34c8d0565082994..a8893c8cb4eb9b8 100644 --- a/clang/test/AST/Interp/intap.cpp +++ b/clang/test/AST/Interp/intap.cpp @@ -157,4 +157,13 @@ namespace Bitfields { // expected-warning {{changes value from 100 to 0}} } +namespace BitOps { + constexpr unsigned __int128 UZero = 0; + constexpr unsigned __int128 Max = ~UZero; + static_assert(Max == ~0, ""); + static_assert((Max & 0) == 0, ""); + static_assert((UZero | 0) == 0, ""); + static_assert((Max ^ Max) == 0, ""); +} + #endif `` https://github.com/llvm/llvm-project/pull/71807 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
https://github.com/phoebewang updated https://github.com/llvm/llvm-project/pull/71318 >From d9ee6309924e7f248695cbd488afe98273432e84 Mon Sep 17 00:00:00 2001 From: Phoebe Wang Date: Sun, 5 Nov 2023 21:15:53 +0800 Subject: [PATCH 1/3] [X86][AVX10] Permit AVX512 options/features used together with AVX10 This patch relaxes the driver logic to permit combinations between AVX512 and AVX10 options and makes sure we have a unified behavior between options and features combination. Here are rules we are following when handle these combinations: 1. evex512 can only be used for avx512xxx options/features. It will be ignored if used without them; 2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them together in any case. It will enable a common super set when they are used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512". Compiler emits warnings when user using combinations like "-mavx512f -mavx10.1-256" in case they won't get unexpected result silently. --- .../clang/Basic/DiagnosticCommonKinds.td | 2 + clang/lib/Basic/Targets/X86.cpp | 57 --- clang/lib/Driver/ToolChains/Arch/X86.cpp | 7 --- clang/lib/Headers/avx2intrin.h| 4 +- clang/lib/Headers/avx512bf16intrin.h | 3 +- clang/lib/Headers/avx512bwintrin.h| 4 +- clang/lib/Headers/avx512dqintrin.h| 4 +- clang/lib/Headers/avx512fintrin.h | 8 ++- clang/lib/Headers/avx512fp16intrin.h | 6 +- clang/lib/Headers/avx512ifmavlintrin.h| 10 +++- clang/lib/Headers/avx512pfintrin.h| 5 -- clang/lib/Headers/avx512vbmivlintrin.h| 11 +++- clang/lib/Headers/avx512vlbf16intrin.h| 14 +++-- clang/lib/Headers/avx512vlbitalgintrin.h | 10 +++- clang/lib/Headers/avx512vlbwintrin.h | 10 +++- clang/lib/Headers/avx512vlcdintrin.h | 11 +++- clang/lib/Headers/avx512vldqintrin.h | 10 +++- clang/lib/Headers/avx512vlfp16intrin.h| 4 +- clang/lib/Headers/avx512vlintrin.h| 10 +++- clang/lib/Headers/avx512vlvbmi2intrin.h | 10 +++- clang/lib/Headers/avx512vlvnniintrin.h| 10 +++- .../lib/Headers/avx512vlvp2intersectintrin.h | 10 ++-- clang/lib/Headers/avx512vpopcntdqvlintrin.h | 8 ++- clang/lib/Headers/avxintrin.h | 4 +- clang/lib/Headers/emmintrin.h | 4 +- clang/lib/Headers/gfniintrin.h| 14 +++-- clang/lib/Headers/pmmintrin.h | 2 +- clang/lib/Headers/smmintrin.h | 2 +- clang/lib/Headers/tmmintrin.h | 4 +- clang/lib/Headers/xmmintrin.h | 4 +- clang/test/CodeGen/X86/avx512-error.c | 13 + clang/test/CodeGen/target-avx-abi-diag.c | 28 - clang/test/Driver/x86-target-features.c | 6 +- 33 files changed, 214 insertions(+), 95 deletions(-) diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td b/clang/include/clang/Basic/DiagnosticCommonKinds.td index 9f0ccd255a32148..8084a4ce0d1751b 100644 --- a/clang/include/clang/Basic/DiagnosticCommonKinds.td +++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td @@ -346,6 +346,8 @@ def err_opt_not_valid_on_target : Error< "option '%0' cannot be specified on this target">; def err_invalid_feature_combination : Error< "invalid feature combination: %0">; +def warn_invalid_feature_combination : Warning< + "invalid feature combination: %0">, InGroup>; def warn_target_unrecognized_env : Warning< "mismatch between architecture and environment in target triple '%0'; did you mean '%1'?">, InGroup; diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp index eec3cd558435e2a..9cfda95f385d627 100644 --- a/clang/lib/Basic/Targets/X86.cpp +++ b/clang/lib/Basic/Targets/X86.cpp @@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap( setFeatureEnabled(Features, F, true); std::vector UpdatedFeaturesVec; - bool HasEVEX512 = true; + std::vector UpdatedAVX10FeaturesVec; + int HasEVEX512 = -1; bool HasAVX512F = false; bool HasAVX10 = false; + bool HasAVX10_512 = false; + std::string LastAVX10; + std::string LastAVX512; for (const auto &Feature : FeaturesVec) { // Expand general-regs-only to -x86, -mmx and -sse if (Feature == "+general-regs-only") { @@ -131,35 +135,50 @@ bool X86TargetInfo::initFeatureMap( continue; } -if (Feature.substr(0, 7) == "+avx10.") { - HasAVX10 = true; - HasAVX512F = true; - if (Feature.substr(Feature.size() - 3, 3) == "512") { -HasEVEX512 = true; - } else if (Feature.substr(7, 2) == "1-") { -HasEVEX512 = false; +if (Feature.substr(1, 6) == "avx10.") { + if (Feature[0] == '+') { +HasAVX10 = true; +if (Feature.substr(Feature.size() - 3, 3) == "512") + HasAVX10_512 = true; +LastAVX10 = Feature; + } else if (HasAVX10 && Feature == "-avx
[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)
@@ -9497,8 +9500,11 @@ Value *CodeGenFunction::EmitSVEScatterStore(const SVETypeFlags &TypeFlags, // mapped to . However, this might be incompatible with the // actual type being stored. For example, when storing doubles (i64) the // predicated should be instead. At the IR level the type of - // the predicate and the data being stored must match. Cast accordingly. - Ops[1] = EmitSVEPredicateCast(Ops[1], OverloadedTy); + // the predicate and the data being stored must match. Cast to the type + // expected by the intrinsic. The intrinsic itself should be defined in + // a way that enforces relations between parameter types. + Ops[1] = EmitSVEPredicateCast( + Ops[1], cast(F->getArg(1)->getType())); momchil-velikov wrote: Certainly when we operate on `Ops` it does not affect `F`. https://github.com/llvm/llvm-project/pull/71290 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
@@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap( setFeatureEnabled(Features, F, true); std::vector UpdatedFeaturesVec; - bool HasEVEX512 = true; + std::vector UpdatedAVX10FeaturesVec; + int HasEVEX512 = -1; phoebewang wrote: I think it's better to use enum. It's a 3-status flag. std::optional isn't much useful here. https://github.com/llvm/llvm-project/pull/71318 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
@@ -15,8 +15,12 @@ #define __AVX2INTRIN_H /* Define the default attributes for the functions in this file. */ -#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, __target__("avx2"), __min_vector_width__(256))) -#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, __target__("avx2"), __min_vector_width__(128))) +#define __DEFAULT_FN_ATTRS256 \ + __attribute__((__always_inline__, __nodebug__, \ + __target__("avx2,no-evex512"), __min_vector_width__(256))) phoebewang wrote: We have defined parts AVX512 intrinsics with `no-evex512` and some of them will call into these AVX2 intrinsics. Then we are facing a problem that we cannot call them in some cases because we didn't specify `no-evex512` for them. https://github.com/llvm/llvm-project/pull/71318 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
@@ -50,11 +50,11 @@ typedef __bf16 __m128bh __attribute__((__vector_size__(16), __aligned__(16))); /* Define the default attributes for the functions in this file. */ #define __DEFAULT_FN_ATTRS \ - __attribute__((__always_inline__, __nodebug__, __target__("sse2"), \ - __min_vector_width__(128))) + __attribute__((__always_inline__, __nodebug__, \ + __target__("sse2,no-evex512"), __min_vector_width__(128))) #define __DEFAULT_FN_ATTRS_MMX \ - __attribute__((__always_inline__, __nodebug__, __target__("mmx,sse2"), \ - __min_vector_width__(64))) + __attribute__((__always_inline__, __nodebug__, \ phoebewang wrote: The same reason as above. https://github.com/llvm/llvm-project/pull/71318 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
@@ -131,35 +135,50 @@ bool X86TargetInfo::initFeatureMap( continue; } -if (Feature.substr(0, 7) == "+avx10.") { - HasAVX10 = true; - HasAVX512F = true; - if (Feature.substr(Feature.size() - 3, 3) == "512") { -HasEVEX512 = true; - } else if (Feature.substr(7, 2) == "1-") { -HasEVEX512 = false; +if (Feature.substr(1, 6) == "avx10.") { + if (Feature[0] == '+') { +HasAVX10 = true; +if (Feature.substr(Feature.size() - 3, 3) == "512") + HasAVX10_512 = true; +LastAVX10 = Feature; + } else if (HasAVX10 && Feature == "-avx10.1-256") { +HasAVX10 = false; +HasAVX10_512 = false; + } else if (HasAVX10_512 && Feature == "-avx10.1-512") { +HasAVX10_512 = false; } + // Postpone AVX10 features handling after AVX512 settled. + UpdatedAVX10FeaturesVec.push_back(Feature); + continue; } else if (!HasAVX512F && Feature.substr(0, 7) == "+avx512") { HasAVX512F = true; + LastAVX512 = Feature; } else if (HasAVX512F && Feature == "-avx512f") { HasAVX512F = false; -} else if (HasAVX10 && Feature == "-avx10.1-256") { - HasAVX10 = false; - HasAVX512F = false; -} else if (!HasEVEX512 && Feature == "+evex512") { +} else if (HasEVEX512 != true && Feature == "+evex512") { phoebewang wrote: I think "std::optional" doesn't help here because we need to distinguish the uninitialized status and false too. https://github.com/llvm/llvm-project/pull/71318 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AMDGPU] Add an option to disable unsafe uses of atomic xor (PR #69229)
pasaulais wrote: @arsenm, could you share this unfinished patch you were working on? I could start from scratch but I don't want to duplicate the work you've already done. https://github.com/llvm/llvm-project/pull/69229 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)
phoebewang wrote: > I'm a little bit confused, What's the expected behavior of `+avx10.1-512 > -avx10.1-256` in codegen aspect? Should we generate only instructions in the > difference of sets? Or do we consider `avx10.1-256` as a base of > `avx10.1-512` and if it is disabled `avx10.1-512` can't be enabled? `-avx10.1-256` works like `-avx512f`, that says, they are special as a fundamental feature, which will turn off all derivative features for AVX10 and AVX512 respectively. OTOH, derivative features will only turn off the difference set, e.g., `+avx10.3-256 -avx10.2-256` equals to `+avx10.1-256`. https://github.com/llvm/llvm-project/pull/71318 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)
https://github.com/MDevereau updated https://github.com/llvm/llvm-project/pull/71795 >From 9846bc9efd79e6e3c2662ea42367c102df88799d Mon Sep 17 00:00:00 2001 From: Matt Devereau Date: Thu, 9 Nov 2023 10:50:05 + Subject: [PATCH 1/2] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics Adds the builtins: void svldr_zt(uint64_t zt, const void *rn) void svstr_zt(uint64_t zt, void *rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) --- clang/include/clang/Basic/arm_sme.td | 5 ++ clang/include/clang/Basic/arm_sve.td | 9 .../acle_sme2_ldr_str_zt.c| 51 +++ llvm/include/llvm/IR/IntrinsicsAArch64.td | 11 ++-- .../Target/AArch64/AArch64ISelDAGToDAG.cpp| 7 ++- .../Target/AArch64/AArch64ISelLowering.cpp| 21 llvm/lib/Target/AArch64/AArch64ISelLowering.h | 2 + .../Target/AArch64/AArch64RegisterInfo.cpp| 6 +++ .../lib/Target/AArch64/AArch64SMEInstrInfo.td | 4 +- llvm/lib/Target/AArch64/SMEInstrFormats.td| 23 +++-- .../CodeGen/AArch64/sme2-intrinsics-zt0.ll| 27 ++ 11 files changed, 153 insertions(+), 13 deletions(-) create mode 100644 clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c create mode 100644 llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll diff --git a/clang/include/clang/Basic/arm_sme.td b/clang/include/clang/Basic/arm_sme.td index b5655afdf419ecf..fe3de56ce3298c5 100644 --- a/clang/include/clang/Basic/arm_sme.td +++ b/clang/include/clang/Basic/arm_sme.td @@ -298,3 +298,8 @@ multiclass ZAAddSub { defm SVADD : ZAAddSub<"add">; defm SVSUB : ZAAddSub<"sub">; + +let TargetGuard = "sme2" in { + def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; + def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +} diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td index 3d4c2129565903d..f0b3747898d4145 100644 --- a/clang/include/clang/Basic/arm_sve.td +++ b/clang/include/clang/Basic/arm_sve.td @@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", "b", MergeNone, "aarch64_ def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, "aarch64_sve_whilewr_h", [IsOverloadWhileRW]>; } +// // +// // Spill and fill of ZT0 +// // + +// let TargetGuard = "sme2" in { +// def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", [IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], [ImmCheck<0, ImmCheck0_0>]>; +// } + // SVE2 - Extended table lookup/permute let TargetGuard = "sve2" in { diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c new file mode 100644 index 000..3d70ded6b469ba1 --- /dev/null +++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c @@ -0,0 +1,51 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + +// REQUIRES: aarch64-registered-target + +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s +// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK +// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s + +#include + +#ifdef SVE_OVERLOADED_FORMS +// A simple used,unused... macro, long enough to represent any SVE builtin. +#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1 +#else +#define SVE_ACLE_FUNC(A1,A2) A1##A2 +#endif + +// LDR ZT0 + +// CHECK-LABEL: @test_svldr_zt( +// CHECK-NEXT: entry: +// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr [[BASE:%.*]]) +// CHECK-NEXT:ret void +// +// CPP-CHECK-LABEL: @_Z13tes
[clang] [CUDA][HIP] Make template implicitly host device (PR #70369)
yxsamliu wrote: ping This patch passes our internal CI. https://github.com/llvm/llvm-project/pull/70369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)
asb wrote: Tagging @MaskRay for a quick check of this too, if he has time. https://github.com/llvm/llvm-project/pull/71803 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)
https://github.com/EugeneZelenko requested changes to this pull request. https://github.com/llvm/llvm-project/pull/71304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)
https://github.com/EugeneZelenko edited https://github.com/llvm/llvm-project/pull/71304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)
@@ -3,13 +3,9 @@ readability-container-data-pointer == -Finds cases where code could use ``data()`` rather than the address of the -element at index 0 in a container. This pattern is commonly used to materialize -a pointer to the backing data of a container. ``std::vector`` and -``std::string`` provide a ``data()`` accessor to retrieve the data pointer which -should be preferred. +Finds cases where code references the address of the element at index 0 in a container and replaces them with calls to ``data()`` or ``c_str()``. -This also ensures that in the case that the container is empty, the data pointer +Using ``data()`` or ``c_str()`` is more readable and ensures that if the container is empty, the data pointer EugeneZelenko wrote: Please follow 80 characters limit. https://github.com/llvm/llvm-project/pull/71304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/71739 >From c3df637dd2cb9a5210cb90a3bb69a63c31236039 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 7 Nov 2023 17:12:31 -0600 Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549 --- clang/lib/CodeGen/CGDeclCXX.cpp | 13 +- clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 -- clang/lib/CodeGen/CGOpenMPRuntime.h | 8 -- clang/lib/CodeGen/CodeGenFunction.h | 5 + clang/lib/CodeGen/CodeGenModule.h | 14 +- clang/lib/CodeGen/ItaniumCXXABI.cpp | 7 + .../amdgcn_openmp_device_math_constexpr.cpp | 48 +-- .../amdgcn_target_global_constructor.cpp | 30 ++-- clang/test/OpenMP/declare_target_codegen.cpp | 1 - ...x_declare_target_var_ctor_dtor_codegen.cpp | 35 + .../llvm/Frontend/OpenMP/OMPIRBuilder.h | 4 - llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp | 7 +- .../plugins-nextgen/amdgpu/src/rtl.cpp| 52 +++ .../common/PluginInterface/GlobalHandler.h| 10 +- .../PluginInterface/PluginInterface.cpp | 7 + .../common/PluginInterface/PluginInterface.h | 14 ++ .../plugins-nextgen/cuda/src/rtl.cpp | 115 openmp/libomptarget/src/rtl.cpp | 6 + .../test/libc/global_ctor_dtor.cpp| 37 + 19 files changed, 327 insertions(+), 216 deletions(-) create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp index 3fa28b343663f61..e08a1e5f42df20c 100644 --- a/clang/lib/CodeGen/CGDeclCXX.cpp +++ b/clang/lib/CodeGen/CGDeclCXX.cpp @@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const VarDecl &VD, registerGlobalDtorWithAtExit(dtorStub); } +/// Register a global destructor using the LLVM 'llvm.global_dtors' global. +void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD, + llvm::FunctionCallee Dtor, + llvm::Constant *Addr) { + // Create a function which calls the destructor. + llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr); + CGM.AddGlobalDtor(dtorStub); +} + void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) { // extern "C" int atexit(void (*f)(void)); assert(dtorStub->getType() == @@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl *D, D->hasAttr())) return; - if (getLangOpts().OpenMP && - getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit)) -return; - // Check if we've already initialized this decl. auto I = DelayedCXXInitPosition.find(D); if (I != DelayedCXXInitPosition.end() && I->second == ~0U) diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp index a8e1150e44566b8..d2be8141a3a4b31 100644 --- a/clang/lib/Code
[llvm] [clang] [AIX] Enable tests relating to 64-bit XCOFF object files (PR #71814)
https://github.com/jakeegan created https://github.com/llvm/llvm-project/pull/71814 We now have 64-bit XCOFF object file support, so these tests can be enabled again. However, some tests still fail due to unsupported debug sections, so I cleaned up their comments. >From 080887dca39dacdf482298b30137e494c0cbcb8b Mon Sep 17 00:00:00 2001 From: Jake Egan <5326451+jakee...@users.noreply.github.com> Date: Thu, 9 Nov 2023 10:05:10 -0500 Subject: [PATCH] [AIX] Enable tests relating to 64-bit XCOFF object files --- clang/test/lit.cfg.py | 37 - llvm/test/lit.cfg.py | 28 .../DebugInfo/DWARF/DWARFDebugInfoTest.cpp| 70 ++-- .../DebugInfo/DWARF/DWARFDebugLineTest.cpp| 155 ++ 4 files changed, 28 insertions(+), 262 deletions(-) diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py index 60843ef8a142048..271372b928ac55c 100644 --- a/clang/test/lit.cfg.py +++ b/clang/test/lit.cfg.py @@ -332,43 +332,6 @@ def calculate_arch_features(arch_string): config.available_features.add("llvm-driver") -def exclude_unsupported_files_for_aix(dirname): -for filename in os.listdir(dirname): -source_path = os.path.join(dirname, filename) -if os.path.isdir(source_path): -continue -f = open(source_path, "r", encoding="ISO-8859-1") -try: -data = f.read() -# 64-bit object files are not supported on AIX, so exclude the tests. -if ( -any( -option in data -for option in ( -"-emit-obj", -"-fmodule-format=obj", -"-fintegrated-as", -) -) -and "64" in config.target_triple -): -config.excludes += [filename] -finally: -f.close() - - -if "aix" in config.target_triple: -for directory in ( -"/CodeGenCXX", -"/Misc", -"/Modules", -"/PCH", -"/Driver", -"/ASTMerge/anonymous-fields", -"/ASTMerge/injected-class-name-decl", -): -exclude_unsupported_files_for_aix(config.test_source_root + directory) - # Some tests perform deep recursion, which requires a larger pthread stack size # than the relatively low default of 192 KiB for 64-bit processes on AIX. The # `AIXTHREAD_STK` environment variable provides a non-intrusive way to request diff --git a/llvm/test/lit.cfg.py b/llvm/test/lit.cfg.py index 022d1aedbdcdbb6..f3b49a398e76062 100644 --- a/llvm/test/lit.cfg.py +++ b/llvm/test/lit.cfg.py @@ -601,34 +601,6 @@ def have_ld64_plugin_support(): config.available_features.add("use_msan_with_origins") -def exclude_unsupported_files_for_aix(dirname): -for filename in os.listdir(dirname): -source_path = os.path.join(dirname, filename) -if os.path.isdir(source_path): -continue -f = open(source_path, "r") -try: -data = f.read() -# 64-bit object files are not supported on AIX, so exclude the tests. -if ( -"-emit-obj" in data or "-filetype=obj" in data -) and "64" in config.target_triple: -config.excludes += [filename] -finally: -f.close() - - -if "aix" in config.target_triple: -for directory in ( -"/CodeGen/X86", -"/DebugInfo", -"/DebugInfo/X86", -"/DebugInfo/Generic", -"/LTO/X86", -"/Linker", -): -exclude_unsupported_files_for_aix(config.test_source_root + directory) - # Some tools support an environment variable "OBJECT_MODE" on AIX OS, which # controls the kind of objects they will support. If there is no "OBJECT_MODE" # environment variable specified, the default behaviour is to support 32-bit diff --git a/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp b/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp index d81557d756300c8..0b7f8f41bc53f43 100644 --- a/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp +++ b/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp @@ -33,6 +33,12 @@ #include "gtest/gtest.h" #include +// AIX doesn't support debug_str_offsets or debug_addr sections +#ifdef _AIX +#define NO_SUPPORT_DEBUG_STR_OFFSETS +#define NO_SUPPORT_DEBUG_ADDR +#endif + using namespace llvm; using namespace dwarf; using namespace utils; @@ -435,11 +441,7 @@ TEST(DWARFDebugInfo, TestDWARF32Version2Addr4AllForms) { TestAllForms<2, AddrType, RefAddrType>(); } -#ifdef _AIX -TEST(DWARFDebugInfo, DISABLED_TestDWARF32Version2Addr8AllForms) { -#else TEST(DWARFDebugInfo, TestDWARF32Version2Addr8AllForms) { -#endif // Test that we can decode all forms for DWARF32, version 2, with 4 byte // addresses. typedef uint64_t AddrType; @@ -457,11 +459,7 @@ TEST(DWARFDebugInfo, TestDWARF32Version3Addr4AllForms) { TestAllForms<3, AddrType