[clang-tools-extra] [clang-tidy] add modernize-use-std-numbers (PR #66583)

2023-11-09 Thread Piotr Zegar via cfe-commits

https://github.com/PiotrZSL requested changes to this pull request.

Example:
```
llvm/include/llvm/Support/MathExtras.h:59:31: warning: prefer std::numbers math 
constant [modernize-use-std-numbers]
   59 | inv_sqrt3f  = .577350269F, // (0x1.279a74P-1)
  |   ^~~
  |   std::numbers::egamma_v
```

```
egammaf = .577215665F
```

Looks like having this check implemented as an multiple matchers isn't a good 
idea, simply because we pickup first one that match instead a nearest one. This 
leads to bugs when dealing with proper values.

In ideal conditions something like x* 3.14 should be even detected as PI. 
Also warning message should already say what from std::numbers should be used 
and how far are current and proposed values from them self.

https://github.com/llvm/llvm-project/pull/66583
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Llvm modules on demand bmi (PR #71773)

2023-11-09 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> > There are 2 things in the patch. One is to generate the BMI and the object 
> > file in one phase (phase here means preprocess, precompile, compile, ...).
> 
> This is the main point of the patch - to do this efficiently.

Got it. The we can be more focused.

> 
> > But after we introduced thin BMI, it looks inefficient to write the AST 
> > twice. So it is on my TODO list after we land the thin BMI patch. BTW, I 
> > think we should do thin in CodeGen action instead of hacking on 
> > WrappedASTConsumer.
> 
> I am curious as to why you think that the multiplex AST consumer is a hack - 
> it seems to be designed exactly for this purpose and existed already (i.e. 
> not part of this patch).

It is not about multiplex AST consumer. It is about WrappedASTConsumer. It is 
designed for plugins. Also it is a private member function of FrontendAction, 
the base of frontend actions. I think we should perform new behaviors in 
sub-actions. It looks not good to perform semantical analysis in 
FrontendAction...

Concretely, I think we need to do this in CodeGenAction.

> 
> > And if we introduce the mechanism to produce BMI for `.cpp`, it implies 
> > that we need to maintain both paths. It is super embracing to me.
> 
> We do not need two mechanisms, .cppm can take the same path as any other 
> suffix.

Then it implies that we need to discard a bunch of existing codes handling 
`.cppm`. Otherwise we'll have two mechanisms.

> 
> > > in the AST consumer on the BMI side doing suitable filtering to eliminate 
> > > the content that is not part of the interface, that is either not needed 
> > > (or in some cases positively unhelpful to consumers).
> 
> > I believe we should do this in ASTWriters.
> 
> I am strongly against doing more semantic work in the AST reader/writer; that 
> is just compounding existing layering violations that are already hurting us.

Agreed in the higher level. But that requires us to implement at least new AST 
writers.

> 
> > Also this should be part of thin BMI.
> 
> I am not sure what you mean here - the full AST is required for code-gen - we 
> can only thin AST either on a separate path (as in this patch) or as a 
> separate step.

I mean it should be successors of 
https://github.com/llvm/llvm-project/pull/71622. Concretely, now we reduce the 
function definition in 
https://github.com/llvm/llvm-project/pull/71622/files#diff-125f472e690aa3d973bc42aa3c5d580226c5c47661551aca2889f960681aa64dR321.



https://github.com/llvm/llvm-project/pull/71773
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] add modernize-use-std-numbers (PR #66583)

2023-11-09 Thread Piotr Zegar via cfe-commits

https://github.com/PiotrZSL edited 
https://github.com/llvm/llvm-project/pull/66583
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Llvm modules on demand bmi (PR #71773)

2023-11-09 Thread Boris Kolpackov via cfe-commits

boris-kolpackov wrote:

>clang++  -std=c++20 foo.cpp -c -fmodule-file=X=some/dir/X.pcm

Hm, according to https://clang.llvm.org/docs/StandardCPlusPlusModules.html this 
can already be achieved with the `-fmodule-output` options (which I was about 
to try in `build2`). Is there a reason a different option is used for what 
seems to be the same functionality. Or am I missing something here?

> This is the main point of the patch - to do this efficiently.

Again, just want to clarify: as I understand it, this patch solves the scaling 
issue Ben reported (https://github.com/llvm/llvm-project/issues/60996) but 
without the thin/fat BMI complications, correct?

https://github.com/llvm/llvm-project/pull/71773
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [flang] [libcxx] [llvm] [compiler-rt] [clang-tools-extra] [BOLT] Read .rela.dyn in static non-pie binary (PR #71635)

2023-11-09 Thread Vladislav Khmelevsky via cfe-commits

https://github.com/yota9 updated https://github.com/llvm/llvm-project/pull/71635

>From 1006708c3cff79b9504beb26ea82cadaec3bb594 Mon Sep 17 00:00:00 2001
From: Vladislav Khmelevsky 
Date: Wed, 8 Nov 2023 11:57:16 +0400
Subject: [PATCH] [BOLT] Read .rela.dyn in static non-pie binary

Static non-pie binary doesn't have DYNAMIC segment and BOLT skips
reading .rela.dyn segment because of it. But such binaries might have
this section for example to store IFUNC relocation which is resolved
by linked-in startup files, so force reading this section for static
executables.
---
 bolt/include/bolt/Rewrite/RewriteInstance.h |  1 +
 bolt/lib/Rewrite/RewriteInstance.cpp| 13 +++
 bolt/test/AArch64/ifunc.c   | 24 +++--
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/bolt/include/bolt/Rewrite/RewriteInstance.h 
b/bolt/include/bolt/Rewrite/RewriteInstance.h
index 2a421c5cfaa4f89..6e9af61d76e30f6 100644
--- a/bolt/include/bolt/Rewrite/RewriteInstance.h
+++ b/bolt/include/bolt/Rewrite/RewriteInstance.h
@@ -421,6 +421,7 @@ class RewriteInstance {
 
   /// Common section names.
   static StringRef getEHFrameSectionName() { return ".eh_frame"; }
+  static StringRef getRelaDynSectionName() { return ".rela.dyn"; }
 
   /// An instance of the input binary we are processing, externally owned.
   llvm::object::ELFObjectFileBase *InputFile;
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp 
b/bolt/lib/Rewrite/RewriteInstance.cpp
index abdbb79e8eb60ef..2d7df15025e3685 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -2139,6 +2139,19 @@ void RewriteInstance::processDynamicRelocations() {
   }
 
   // The rest of dynamic relocations - DT_RELA.
+  // The static executable might have .rela.dyn secion and not have PT_DYNAMIC
+  if (!DynamicRelocationsSize && BC->IsStaticExecutable) {
+ErrorOr DynamicRelSectionOrErr =
+BC->getUniqueSectionByName(getRelaDynSectionName());
+if (DynamicRelSectionOrErr) {
+  DynamicRelocationsAddress = DynamicRelSectionOrErr->getAddress();
+  DynamicRelocationsSize = DynamicRelSectionOrErr->getSize();
+  const SectionRef &SectionRef = DynamicRelSectionOrErr->getSectionRef();
+  DynamicRelativeRelocationsCount = std::distance(
+  SectionRef.relocation_begin(), SectionRef.relocation_end());
+}
+  }
+
   if (DynamicRelocationsSize > 0) {
 ErrorOr DynamicRelSectionOrErr =
 BC->getSectionForAddress(*DynamicRelocationsAddress);
diff --git a/bolt/test/AArch64/ifunc.c b/bolt/test/AArch64/ifunc.c
index dea2cf6bd543f0a..8edb913ee70d5c0 100644
--- a/bolt/test/AArch64/ifunc.c
+++ b/bolt/test/AArch64/ifunc.c
@@ -7,6 +7,20 @@
 // RUN: llvm-bolt %t.O0.exe -o %t.O0.bolt.exe \
 // RUN:   --print-disasm --print-only=_start | \
 // RUN:   FileCheck --check-prefix=O0_CHECK %s
+// RUN: llvm-readelf -aW %t.O0.bolt.exe | \
+// RUN:   FileCheck --check-prefix=REL_CHECK %s
+
+// Non-pie static executable doesn't generate PT_DYNAMIC, check relocation
+// is readed successfully and IPLT trampoline has been identified by bolt.
+// RUN: %clang %cflags -nostdlib -O3 %s -fuse-ld=lld -no-pie \
+// RUN:   -o %t.O3_nopie.exe -Wl,-q
+// RUN: llvm-readelf -l %t.O3_nopie.exe | \
+// RUN:   FileCheck --check-prefix=NON_DYN_CHECK %s
+// RUN: llvm-bolt %t.O3_nopie.exe -o %t.O3_nopie.bolt.exe  \
+// RUN:   --print-disasm --print-only=_start | \
+// RUN:   FileCheck --check-prefix=O3_CHECK %s
+// RUN: llvm-readelf -aW %t.O3_nopie.bolt.exe | \
+// RUN:   FileCheck --check-prefix=REL_CHECK %s
 
 // With -O3 direct call is performed on IPLT trampoline. IPLT trampoline
 // doesn't have associated symbol. The ifunc symbol has the same address as
@@ -16,6 +30,8 @@
 // RUN: llvm-bolt %t.O3_pie.exe -o %t.O3_pie.bolt.exe  \
 // RUN:   --print-disasm --print-only=_start | \
 // RUN:   FileCheck --check-prefix=O3_CHECK %s
+// RUN: llvm-readelf -aW %t.O3_pie.bolt.exe | \
+// RUN:   FileCheck --check-prefix=REL_CHECK %s
 
 // Check that IPLT trampoline located in .plt section are normally handled by
 // BOLT. The gnu-ld linker doesn't use separate .iplt section.
@@ -24,12 +40,16 @@
 // RUN: llvm-bolt %t.iplt_O3_pie.exe -o %t.iplt_O3_pie.bolt.exe  \
 // RUN:   --print-disasm --print-only=_start  | \
 // RUN:   FileCheck --check-prefix=O3_CHECK %s
+// RUN: llvm-readelf -aW %t.iplt_O3_pie.bolt.exe | \
+// RUN:   FileCheck --check-prefix=REL_CHECK %s
+
+// NON_DYN_CHECK-NOT: DYNAMIC
 
 // O0_CHECK: adr x{{[0-9]+}}, ifoo
 // O3_CHECK: b "{{resolver_foo|ifoo}}{{.*}}@PLT"
 
-#include 
-#include 
+// REL_CHECK: R_AARCH64_IRELATIVE [[#%x,REL_SYMB_ADDR:]]
+// REL_CHECK: [[#REL_SYMB_ADDR]] {{.*}} FUNC {{.*}} resolver_foo
 
 static void foo() {}
 

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Llvm modules on demand bmi (PR #71773)

2023-11-09 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> > clang++  -std=c++20 foo.cpp -c -fmodule-file=X=some/dir/X.pcm
> 
> Hm, according to https://clang.llvm.org/docs/StandardCPlusPlusModules.html 
> this can already be achieved with the `-fmodule-output` option (and which I 
> was about to try in `build2`). Is there a reason a different option is used 
> for what seems to be the same functionality. Or am I missing something here?
> 
> > This is the main point of the patch - to do this efficiently.
> 
> Again, just want to clarify: as I understand it, this patch solves the 
> scaling issue Ben reported (#60996) but without the thin/fat BMI 
> complications, correct?

The difference is about the efficiency and the interfaces doesn't change a lot. 
Previously, in the one phase compilation mode, what clang did actually is:

```
x.cppm -> x.pcm -> x.o
```

That said we compile `x.o` from `x.pcm`. There is a reading BMI process. The 
goal of the patch is to remove the reading process.

https://github.com/llvm/llvm-project/pull/71773
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)

2023-11-09 Thread Piotr Zegar via cfe-commits


@@ -3,13 +3,9 @@
 readability-container-data-pointer
 ==
 
-Finds cases where code could use ``data()`` rather than the address of the
-element at index 0 in a container. This pattern is commonly used to materialize
-a pointer to the backing data of a container. ``std::vector`` and
-``std::string`` provide a ``data()`` accessor to retrieve the data pointer 
which
-should be preferred.
+Finds cases where code references the address of the element at index 0 in a 
container and replaces them with calls to ``data()`` or ``c_str()``.

PiotrZSL wrote:

Still not wrapped on 80 collumn

https://github.com/llvm/llvm-project/pull/71304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)

2023-11-09 Thread Piotr Zegar via cfe-commits


@@ -111,16 +115,18 @@ void ContainerDataPointerCheck::check(const 
MatchFinder::MatchResult &Result) {
MemberExpr>(CE))
 ReplacementText = "(" + ReplacementText + ")";
 
-  if (CE->getType()->isPointerType())
-ReplacementText += "->data()";
-  else
-ReplacementText += ".data()";
+  ReplacementText += CE->getType()->isPointerType() ? "->" : ".";
+  ReplacementText += CStrMethod ? "c_str()" : "data()";
+
+  std::string Description =

PiotrZSL wrote:

use llvm::StringRef instead of std::string here

https://github.com/llvm/llvm-project/pull/71304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Refactor `IdentifierInfo::ObjcOrBuiltinID` (PR #71709)

2023-11-09 Thread Vlad Serebrennikov via cfe-commits


@@ -86,19 +87,26 @@ enum { IdentifierInfoAlignment = 8 };
 static constexpr int ObjCOrBuiltinIDBits = 16;
 
 /// The "layout" of ObjCOrBuiltinID is:
-///  - The first value (0) represents "not a special identifier".
-///  - The next (NUM_OBJC_KEYWORDS - 1) values represent ObjCKeywordKinds (not
-///including objc_not_keyword).
-///  - The next (NUM_INTERESTING_IDENTIFIERS - 1) values represent
-///InterestingIdentifierKinds (not including not_interesting).
-///  - The rest of the values represent builtin IDs (not including NotBuiltin).
-static constexpr int FirstObjCKeywordID = 1;
-static constexpr int LastObjCKeywordID =
-FirstObjCKeywordID + tok::NUM_OBJC_KEYWORDS - 2;
-static constexpr int FirstInterestingIdentifierID = LastObjCKeywordID + 1;
-static constexpr int LastInterestingIdentifierID =
-FirstInterestingIdentifierID + tok::NUM_INTERESTING_IDENTIFIERS - 2;
-static constexpr int FirstBuiltinID = LastInterestingIdentifierID + 1;
+///  - ObjCKeywordKind enumerators
+///  - InterestingIdentifierKind enumerators
+///  - Builtin::ID enumerators
+///  - NonSpecialIdentifier
+enum class ObjCKeywordOrInterestingOrBuiltin {
+#define OBJC_AT_KEYWORD(X) objc_##X,
+#include "clang/Basic/TokenKinds.def"
+  NUM_OBJC_KEYWORDS,

Endilll wrote:

Not just this enumerator, but all OjbC keywords and interesting identifiers. I 
consider this a feature, actually, because debuggers would show enumerator name 
that both makes sense and useful while displaying `ObjCOrBuiltinID` bit-fields. 
Having `ObjCKeywordOrInterestingOrBuiltin` as a scoped enum should prevent name 
collisions with any other enum. enumeratrion

https://github.com/llvm/llvm-project/pull/71709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clangd] Use InitLLVM (PR #69119)

2023-11-09 Thread Haojian Wu via cfe-commits

https://github.com/hokein approved this pull request.


https://github.com/llvm/llvm-project/pull/69119
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Refactor `IdentifierInfo::ObjcOrBuiltinID` (PR #71709)

2023-11-09 Thread Vlad Serebrennikov via cfe-commits

Endilll wrote:

> Oh, I didn't look into the identifier's system before. I took a while to look 
> at the patch but I failed to understand it and I failed to find the 
> relationships between this patch and header units...

Yeah, the part this PR touches in not the most straightforward one. Thank you 
for you time!

https://github.com/llvm/llvm-project/pull/71709
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Do not clear FP pragma stack when instantiating functions (PR #70646)

2023-11-09 Thread Tobias Hieta via cfe-commits

tru wrote:

Can this be merged and ready for a backport next week?

https://github.com/llvm/llvm-project/pull/70646
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread via cfe-commits

mikaelholmen wrote:

I think this patch causes miscompiles. Reproduce with
```opt bbi-88690.ll -passes=instcombine -S -o -```
So with this patch instcombine turns
```
@v_936 = global i16 -3276, align 1
@v_937 = global i24 0, align 1

define i16 @main() {
entry:
  %0 = load i16, ptr @v_936, align 1
  %unsclear = and i16 %0, 32767
  %resize = zext i16 %unsclear to i24
  %unsclear1 = and i24 %resize, 8388607
  store i24 %unsclear1, ptr @v_937, align 1
  ret i16 0
}
```
into
```
@v_936 = global i16 -3276, align 1
@v_937 = global i24 0, align 1

define i16 @main() {
entry:
  %0 = load i16, ptr @v_936, align 1
  %resize = zext nneg i16 %0 to i24
  store i24 %resize, ptr @v_937, align 1
  ret i16 0
}
```
I.e the and with 32767 (0x7fff) is gone and instead the zext got "nneg"?
But the value in v_936 can be, and actually _is_ negative.

[bbi-88690.ll.gz](https://github.com/llvm/llvm-project/files/13306009/bbi-88690.ll.gz)


https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang-tools-extra] [clang] [PowerPC] Check value uses in ValueBit tracking (PR #66040)

2023-11-09 Thread Qiu Chaofan via cfe-commits

https://github.com/ecnelises updated 
https://github.com/llvm/llvm-project/pull/66040

>From ebaafdd6d45bb62b1847e60df627dfd96971a22c Mon Sep 17 00:00:00 2001
From: Qiu Chaofan 
Date: Tue, 12 Sep 2023 10:39:55 +0800
Subject: [PATCH] [PowerPC] Check value uses in ValueBit tracking

---
 llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp   | 162 +++---
 llvm/test/CodeGen/PowerPC/int128_ldst.ll  |  18 +-
 .../PowerPC/loop-instr-form-prepare.ll|   6 +-
 llvm/test/CodeGen/PowerPC/prefer-dqform.ll|   4 +-
 llvm/test/CodeGen/PowerPC/rldimi.ll   |  19 +-
 5 files changed, 117 insertions(+), 92 deletions(-)

diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp 
b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index b57d185bb638b8c..8af50b10d3c7e1d 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -1630,30 +1630,41 @@ class BitPermutationSelector {
 bool &Interesting = ValueEntry->first;
 SmallVector &Bits = ValueEntry->second;
 Bits.resize(NumBits);
+SDValue LHS = V.getNumOperands() > 0 ? V.getOperand(0) : SDValue();
+SDValue RHS = V.getNumOperands() > 1 ? V.getOperand(1) : SDValue();
 
 switch (V.getOpcode()) {
 default: break;
 case ISD::ROTL:
-  if (isa(V.getOperand(1))) {
+  if (isa(RHS)) {
 unsigned RotAmt = V.getConstantOperandVal(1);
 
-const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
-
-for (unsigned i = 0; i < NumBits; ++i)
-  Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt];
+if (LHS.hasOneUse()) {
+  const auto &LHSBits = *getValueBits(LHS, NumBits).second;
+  for (unsigned i = 0; i < NumBits; ++i)
+Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - 
RotAmt];
+} else {
+  for (unsigned i = 0; i < NumBits; ++i)
+Bits[i] =
+ValueBit(LHS, i < RotAmt ? i + (NumBits - RotAmt) : i - 
RotAmt);
+}
 
 return std::make_pair(Interesting = true, &Bits);
   }
   break;
 case ISD::SHL:
 case PPCISD::SHL:
-  if (isa(V.getOperand(1))) {
+  if (isa(RHS)) {
 unsigned ShiftAmt = V.getConstantOperandVal(1);
 
-const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
-
-for (unsigned i = ShiftAmt; i < NumBits; ++i)
-  Bits[i] = LHSBits[i - ShiftAmt];
+if (LHS.hasOneUse()) {
+  const auto &LHSBits = *getValueBits(LHS, NumBits).second;
+  for (unsigned i = ShiftAmt; i < NumBits; ++i)
+Bits[i] = LHSBits[i - ShiftAmt];
+} else {
+  for (unsigned i = ShiftAmt; i < NumBits; ++i)
+Bits[i] = ValueBit(LHS, i - ShiftAmt);
+}
 
 for (unsigned i = 0; i < ShiftAmt; ++i)
   Bits[i] = ValueBit(ValueBit::ConstZero);
@@ -1663,13 +1674,17 @@ class BitPermutationSelector {
   break;
 case ISD::SRL:
 case PPCISD::SRL:
-  if (isa(V.getOperand(1))) {
+  if (isa(RHS)) {
 unsigned ShiftAmt = V.getConstantOperandVal(1);
 
-const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
-
-for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)
-  Bits[i] = LHSBits[i + ShiftAmt];
+if (LHS.hasOneUse()) {
+  const auto &LHSBits = *getValueBits(LHS, NumBits).second;
+  for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)
+Bits[i] = LHSBits[i + ShiftAmt];
+} else {
+  for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)
+Bits[i] = ValueBit(LHS, i + ShiftAmt);
+}
 
 for (unsigned i = NumBits - ShiftAmt; i < NumBits; ++i)
   Bits[i] = ValueBit(ValueBit::ConstZero);
@@ -1678,23 +1693,27 @@ class BitPermutationSelector {
   }
   break;
 case ISD::AND:
-  if (isa(V.getOperand(1))) {
+  if (isa(RHS)) {
 uint64_t Mask = V.getConstantOperandVal(1);
 
-const SmallVector *LHSBits;
+const SmallVector *LHSBits = nullptr;
 // Mark this as interesting, only if the LHS was also interesting. This
 // prevents the overall procedure from matching a single immediate 
'and'
 // (which is non-optimal because such an and might be folded with other
 // things if we don't select it here).
-std::tie(Interesting, LHSBits) = getValueBits(V.getOperand(0), 
NumBits);
+if (LHS.hasOneUse())
+  std::tie(Interesting, LHSBits) = getValueBits(LHS, NumBits);
 
 for (unsigned i = 0; i < NumBits; ++i)
-  if (((Mask >> i) & 1) == 1)
-Bits[i] = (*LHSBits)[i];
-  else {
+  if (((Mask >> i) & 1) == 1) {
+if (LHS.hasOneUse())
+  Bits[i] = (*LHSBits)[i];
+else
+  Bits[i] = ValueBit(LHS, i);
+  } else {
 // AND instruction masks this bit. If the input is already zero,
 // we have nothing to 

[llvm] [clang-tools-extra] [clang] [PowerPC] Check value uses in ValueBit tracking (PR #66040)

2023-11-09 Thread Qiu Chaofan via cfe-commits

ecnelises wrote:

Gentle ping... any comments?

https://github.com/llvm/llvm-project/pull/66040
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)

2023-11-09 Thread Balázs Kéri via cfe-commits

balazske wrote:

I tested on vim and the problematic report disappeared, no other changes were 
detected.

https://github.com/llvm/llvm-project/pull/71373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)

2023-11-09 Thread Balázs Kéri via cfe-commits

balazske wrote:

The checker was already tested on some projects, but much more is needed to 
find such corner cases. It can be better to manually check the functions for 
cases when a 0 return value is not possible or only at a special (known) case.

https://github.com/llvm/llvm-project/pull/71373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

I have only briefly looked at the NVPTX implementation.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jplehr wrote:

Is there a specific reason we do not return the error here, but instead consume 
and return success?

Also, I think this should be `Plugin::success()` to not deviate from what is 
used in the plugin.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+GenericKernelTy *AMDGPUKernel = Plugin.allocate();
+if (!AMDGPUKernel)
+  return Plugin::error("Failed to allocate memory for AMDGPU kernel");
+
+new (AMDGPUKernel) AMDGPUKernelTy(Name);
+if (auto Err = AMDGPUKernel->initImpl(*this, Image))
+  return std::move(Err);
+
+auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>();
+AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr);
+
+if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper))
+  return std::move(Err);
+
+KernelArgsTy KernelArgs = {};
+if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u,
+/*NumBlocks=*/1ul, KernelArgs,
+/*Args=*/nullptr, 
AsyncInfoWrapper))
+  return std::move(Err);
+
+if (auto Err = synchronize(AsyncInfoPtr))
+  return std::move(Err);
+Error Err = Error::success();

jplehr wrote:

Should this be `Plugin::success()` instead here as well?

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)

2023-11-09 Thread Qiu Chaofan via cfe-commits


@@ -289,3 +289,29 @@ namespace PR8168 {
 static void foo() {} // expected-error{{'static' member function 'foo' 
overrides a virtual function}}
   };
 }
+
+namespace T13 {
+  class A {
+  public:
+virtual const int* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual int* foo(); // expected-error{{return type of virtual function 
'foo' is not covariant with the return type of the function it overrides ('int 
*' has different qualifiers than 'const int *')}}
+  };
+}
+
+namespace T14 {
+  struct a {};
+
+  class A {
+  public:
+virtual const a* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual volatile a* foo(); // expected-error{{return type of virtual 
function 'foo' is not covariant with the return type of the function it 
overrides (class type 'volatile a *' is more qualified than class type 'const a 
*')}}

ecnelises wrote:

Hmm, right, we can't say `volatile` is more qualified than `const` or not. But 
`virtual volatile a* foo(); ... virtual a* foo() override;` is acceptable as 
long as `a` is a class-type, so saying `has different qualifiers` also looks 
inaccurate.

https://github.com/llvm/llvm-project/pull/71696
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 0f7aaeb - [C++20] [Modules] Allow export from language linkage

2023-11-09 Thread Chuanqi Xu via cfe-commits

Author: Chuanqi Xu
Date: 2023-11-09T17:44:41+08:00
New Revision: 0f7aaeb3241c3803489a45753190e82dbc7fd5fa

URL: 
https://github.com/llvm/llvm-project/commit/0f7aaeb3241c3803489a45753190e82dbc7fd5fa
DIFF: 
https://github.com/llvm/llvm-project/commit/0f7aaeb3241c3803489a45753190e82dbc7fd5fa.diff

LOG: [C++20] [Modules] Allow export from language linkage

Close https://github.com/llvm/llvm-project/issues/71347

Previously I misread the concept of module purview. I thought if a
declaration attached to a unnamed module, it can't be part of the module
purview. But after the issue report, I recognized that module purview is
more of a concept about locations instead of semantics.

Concretely, the things in the language linkage after module declarations
can be exported.

This patch refactors `Module::isModulePurview()` and introduces some
possible code cleanups.

Added: 


Modified: 
clang/include/clang/Basic/Module.h
clang/include/clang/Lex/ModuleMap.h
clang/include/clang/Sema/Sema.h
clang/include/clang/Serialization/ASTWriter.h
clang/lib/AST/ASTContext.cpp
clang/lib/AST/Decl.cpp
clang/lib/AST/DeclBase.cpp
clang/lib/CodeGen/CodeGenModule.cpp
clang/lib/Frontend/ASTUnit.cpp
clang/lib/Lex/ModuleMap.cpp
clang/lib/Sema/Sema.cpp
clang/lib/Sema/SemaDecl.cpp
clang/lib/Sema/SemaDeclCXX.cpp
clang/lib/Sema/SemaLookup.cpp
clang/lib/Sema/SemaModule.cpp
clang/lib/Serialization/ASTWriterDecl.cpp
clang/test/Modules/export-language-linkage.cppm
clang/test/SemaCXX/modules.cppm

Removed: 




diff  --git a/clang/include/clang/Basic/Module.h 
b/clang/include/clang/Basic/Module.h
index 239eb5a637f3ecf..08b153e8c1c9d33 100644
--- a/clang/include/clang/Basic/Module.h
+++ b/clang/include/clang/Basic/Module.h
@@ -178,9 +178,8 @@ class alignas(8) Module {
   /// eventually be exposed, for use in "private" modules.
   std::string ExportAsModule;
 
-  /// Does this Module scope describe part of the purview of a standard named
-  /// C++ module?
-  bool isModulePurview() const {
+  /// Does this Module is a named module of a standard named module?
+  bool isNamedModule() const {
 switch (Kind) {
 case ModuleInterfaceUnit:
 case ModuleImplementationUnit:

diff  --git a/clang/include/clang/Lex/ModuleMap.h 
b/clang/include/clang/Lex/ModuleMap.h
index d5824713970ea7b..32e7e8f899e502c 100644
--- a/clang/include/clang/Lex/ModuleMap.h
+++ b/clang/include/clang/Lex/ModuleMap.h
@@ -556,8 +556,8 @@ class ModuleMap {
   /// parent.
   Module *createGlobalModuleFragmentForModuleUnit(SourceLocation Loc,
   Module *Parent = nullptr);
-  Module *createImplicitGlobalModuleFragmentForModuleUnit(
-  SourceLocation Loc, bool IsExported, Module *Parent = nullptr);
+  Module *createImplicitGlobalModuleFragmentForModuleUnit(SourceLocation Loc,
+  Module *Parent);
 
   /// Create a global module fragment for a C++ module interface unit.
   Module *createPrivateModuleFragmentForInterfaceUnit(Module *Parent,

diff  --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index fe8b387f198c56e..63d548c30da7f6e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -2317,14 +2317,9 @@ class Sema final {
   clang::Module *TheGlobalModuleFragment = nullptr;
 
   /// The implicit global module fragments of the current translation unit.
-  /// We would only create at most two implicit global module fragments to
-  /// avoid performance penalties when there are many language linkage
-  /// exports.
   ///
-  /// The contents in the implicit global module fragment can't be discarded
-  /// no matter if it is exported or not.
+  /// The contents in the implicit global module fragment can't be discarded.
   clang::Module *TheImplicitGlobalModuleFragment = nullptr;
-  clang::Module *TheExportedImplicitGlobalModuleFragment = nullptr;
 
   /// Namespace definitions that we will export when they finish.
   llvm::SmallPtrSet DeferredExportedNamespaces;
@@ -2336,9 +2331,7 @@ class Sema final {
 
   /// Helper function to judge if we are in module purview.
   /// Return false if we are not in a module.
-  bool isCurrentModulePurview() const {
-return getCurrentModule() ? getCurrentModule()->isModulePurview() : false;
-  }
+  bool isCurrentModulePurview() const;
 
   /// Enter the scope of the explicit global module fragment.
   Module *PushGlobalModuleFragment(SourceLocation BeginLoc);
@@ -2346,8 +2339,7 @@ class Sema final {
   void PopGlobalModuleFragment();
 
   /// Enter the scope of an implicit global module fragment.
-  Module *PushImplicitGlobalModuleFragment(SourceLocation BeginLoc,
-   bool IsExported);
+  Module *PushImplicitGlobalModuleFragment(SourceLocation BeginLoc);
   /// Leave the scope of an implicit

[clang] [llvm] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread via cfe-commits

dyung wrote:

We also have a couple of internal tests that seem to be failing after this 
commit. Consider the following code:
```c++
char print_tmp[1];
void print(char *, void *data, unsigned size) {
  unsigned char *bytes = (unsigned char *)data;
  for (unsigned i = 0; i != size; ++i)
sprintf(print_tmp + i * 2, "%02x", bytes[size - 1 - i]);
  printf(print_tmp);
}
#define PRINT(VAR) print(#VAR, &VAR, sizeof(VAR))
struct {
  long b : 17;
} test141_struct_id29534;
struct test141_struct_id29574_ {
  test141_struct_id29574_() { INIT(172, *this); }
  unsigned a : 15;
} test141_struct_id29574;
int main() {
  long id29692 = test141_struct_id29534.b = test141_struct_id29574.a;
  PRINT(id29692);
}
```
When compiled without optimizations (and before this change with optimization) 
it would print out the value `2dac`. But after this change, when 
optimizations are enabled, the program now prints out `adac`.

You can see the difference at https://godbolt.org/z/vjPvGT5G9.

https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [compiler-rt] [libcxx] [flang] [llvm] [clang-tools-extra] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)

2023-11-09 Thread Qiu Chaofan via cfe-commits

https://github.com/ecnelises updated 
https://github.com/llvm/llvm-project/pull/71696

>From 1d0109b7f370a3689a92e20ab52597b112669e47 Mon Sep 17 00:00:00 2001
From: Qiu Chaofan 
Date: Thu, 9 Nov 2023 00:00:26 +0800
Subject: [PATCH 1/2] [Clang][Sema] Fix qualifier restriction of overriden
 methods

If return type of overriden method is pointer or reference to
non-class type, qualifiers cannot be dropped. This also fixes check
when qualifier of overriden method's class return type is not subset
of super method's.
---
 .../clang/Basic/DiagnosticSemaKinds.td|  2 +-
 clang/lib/Sema/SemaDeclCXX.cpp| 15 +-
 clang/test/SemaCXX/virtual-override.cpp   | 28 ++-
 3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 18c2e861385e463..e60a7513d54e552 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2115,7 +2115,7 @@ def err_covariant_return_type_different_qualifications : 
Error<
 def err_covariant_return_type_class_type_more_qualified : Error<
   "return type of virtual function %0 is not covariant with the return type of 
"
   "the function it overrides (class type %1 is more qualified than class "
-  "type %2">;
+  "type %2)">;
 
 // C++ implicit special member functions
 def note_in_declaration_of_implicit_special_member : Note<
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 60786a880b9d3fd..b2c1f1fff9d7e7b 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -18469,7 +18469,7 @@ bool Sema::CheckOverridingFunctionReturnType(const 
CXXMethodDecl *New,
 
 
   // The new class type must have the same or less qualifiers as the old type.
-  if (NewClassTy.isMoreQualifiedThan(OldClassTy)) {
+  if (!OldClassTy.isAtLeastAsQualifiedAs(NewClassTy)) {
 Diag(New->getLocation(),
  diag::err_covariant_return_type_class_type_more_qualified)
 << New->getDeclName() << NewTy << OldTy
@@ -18479,6 +18479,19 @@ bool Sema::CheckOverridingFunctionReturnType(const 
CXXMethodDecl *New,
 return true;
   }
 
+  // Non-class return types should not drop qualifiers in overriden method.
+  if (!OldClassTy->isStructureOrClassType() &&
+  OldClassTy.getLocalCVRQualifiers() !=
+  NewClassTy.getLocalCVRQualifiers()) {
+Diag(New->getLocation(),
+ diag::err_covariant_return_type_different_qualifications)
+<< New->getDeclName() << NewTy << OldTy
+<< New->getReturnTypeSourceRange();
+Diag(Old->getLocation(), diag::note_overridden_virtual_function)
+<< Old->getReturnTypeSourceRange();
+return true;
+  }
+
   return false;
 }
 
diff --git a/clang/test/SemaCXX/virtual-override.cpp 
b/clang/test/SemaCXX/virtual-override.cpp
index 72abfc3cf51e1f7..003f4826a3d6c86 100644
--- a/clang/test/SemaCXX/virtual-override.cpp
+++ b/clang/test/SemaCXX/virtual-override.cpp
@@ -87,7 +87,7 @@ class A {
 
 class B : A {
   virtual a* f(); 
-  virtual const a* g(); // expected-error{{return type of virtual function 'g' 
is not covariant with the return type of the function it overrides (class type 
'const a *' is more qualified than class type 'a *'}}
+  virtual const a* g(); // expected-error{{return type of virtual function 'g' 
is not covariant with the return type of the function it overrides (class type 
'const a *' is more qualified than class type 'a *')}}
 };
 
 }
@@ -289,3 +289,29 @@ namespace PR8168 {
 static void foo() {} // expected-error{{'static' member function 'foo' 
overrides a virtual function}}
   };
 }
+
+namespace T13 {
+  class A {
+  public:
+virtual const int* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual int* foo(); // expected-error{{return type of virtual function 
'foo' is not covariant with the return type of the function it overrides ('int 
*' has different qualifiers than 'const int *')}}
+  };
+}
+
+namespace T14 {
+  struct a {};
+
+  class A {
+  public:
+virtual const a* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual volatile a* foo(); // expected-error{{return type of virtual 
function 'foo' is not covariant with the return type of the function it 
overrides (class type 'volatile a *' is more qualified than class type 'const a 
*')}}
+  };
+}

>From 5f64fec64b51542abd72a9a870ae9e5fe357d026 Mon Sep 17 00:00:00 2001
From: Qiu Chaofan 
Date: Thu, 9 Nov 2023 17:49:33 +0800
Subject: [PATCH 2/2] Say 'different qualifiers' instead of 'more qualified'

---
 clang/test/SemaCXX/virtual-override.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/SemaCXX/virtual-override.cpp 
b/clang/test/SemaCXX/virtual-override.cpp
index 003f4826a3d6c86..3a10e15a663a50a 100644
--- a/clang/test/S

[clang] [compiler-rt] [libcxx] [flang] [llvm] [clang-tools-extra] [Clang][Sema] Fix qualifier restriction of overriden methods (PR #71696)

2023-11-09 Thread Qiu Chaofan via cfe-commits

https://github.com/ecnelises updated 
https://github.com/llvm/llvm-project/pull/71696

>From 1d0109b7f370a3689a92e20ab52597b112669e47 Mon Sep 17 00:00:00 2001
From: Qiu Chaofan 
Date: Thu, 9 Nov 2023 00:00:26 +0800
Subject: [PATCH 1/3] [Clang][Sema] Fix qualifier restriction of overriden
 methods

If return type of overriden method is pointer or reference to
non-class type, qualifiers cannot be dropped. This also fixes check
when qualifier of overriden method's class return type is not subset
of super method's.
---
 .../clang/Basic/DiagnosticSemaKinds.td|  2 +-
 clang/lib/Sema/SemaDeclCXX.cpp| 15 +-
 clang/test/SemaCXX/virtual-override.cpp   | 28 ++-
 3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 18c2e861385e463..e60a7513d54e552 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -2115,7 +2115,7 @@ def err_covariant_return_type_different_qualifications : 
Error<
 def err_covariant_return_type_class_type_more_qualified : Error<
   "return type of virtual function %0 is not covariant with the return type of 
"
   "the function it overrides (class type %1 is more qualified than class "
-  "type %2">;
+  "type %2)">;
 
 // C++ implicit special member functions
 def note_in_declaration_of_implicit_special_member : Note<
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 60786a880b9d3fd..b2c1f1fff9d7e7b 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -18469,7 +18469,7 @@ bool Sema::CheckOverridingFunctionReturnType(const 
CXXMethodDecl *New,
 
 
   // The new class type must have the same or less qualifiers as the old type.
-  if (NewClassTy.isMoreQualifiedThan(OldClassTy)) {
+  if (!OldClassTy.isAtLeastAsQualifiedAs(NewClassTy)) {
 Diag(New->getLocation(),
  diag::err_covariant_return_type_class_type_more_qualified)
 << New->getDeclName() << NewTy << OldTy
@@ -18479,6 +18479,19 @@ bool Sema::CheckOverridingFunctionReturnType(const 
CXXMethodDecl *New,
 return true;
   }
 
+  // Non-class return types should not drop qualifiers in overriden method.
+  if (!OldClassTy->isStructureOrClassType() &&
+  OldClassTy.getLocalCVRQualifiers() !=
+  NewClassTy.getLocalCVRQualifiers()) {
+Diag(New->getLocation(),
+ diag::err_covariant_return_type_different_qualifications)
+<< New->getDeclName() << NewTy << OldTy
+<< New->getReturnTypeSourceRange();
+Diag(Old->getLocation(), diag::note_overridden_virtual_function)
+<< Old->getReturnTypeSourceRange();
+return true;
+  }
+
   return false;
 }
 
diff --git a/clang/test/SemaCXX/virtual-override.cpp 
b/clang/test/SemaCXX/virtual-override.cpp
index 72abfc3cf51e1f7..003f4826a3d6c86 100644
--- a/clang/test/SemaCXX/virtual-override.cpp
+++ b/clang/test/SemaCXX/virtual-override.cpp
@@ -87,7 +87,7 @@ class A {
 
 class B : A {
   virtual a* f(); 
-  virtual const a* g(); // expected-error{{return type of virtual function 'g' 
is not covariant with the return type of the function it overrides (class type 
'const a *' is more qualified than class type 'a *'}}
+  virtual const a* g(); // expected-error{{return type of virtual function 'g' 
is not covariant with the return type of the function it overrides (class type 
'const a *' is more qualified than class type 'a *')}}
 };
 
 }
@@ -289,3 +289,29 @@ namespace PR8168 {
 static void foo() {} // expected-error{{'static' member function 'foo' 
overrides a virtual function}}
   };
 }
+
+namespace T13 {
+  class A {
+  public:
+virtual const int* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual int* foo(); // expected-error{{return type of virtual function 
'foo' is not covariant with the return type of the function it overrides ('int 
*' has different qualifiers than 'const int *')}}
+  };
+}
+
+namespace T14 {
+  struct a {};
+
+  class A {
+  public:
+virtual const a* foo(); // expected-note{{overridden virtual function is 
here}}
+  };
+
+  class B: public A {
+  public:
+virtual volatile a* foo(); // expected-error{{return type of virtual 
function 'foo' is not covariant with the return type of the function it 
overrides (class type 'volatile a *' is more qualified than class type 'const a 
*')}}
+  };
+}

>From 5f64fec64b51542abd72a9a870ae9e5fe357d026 Mon Sep 17 00:00:00 2001
From: Qiu Chaofan 
Date: Thu, 9 Nov 2023 17:49:33 +0800
Subject: [PATCH 2/3] Say 'different qualifiers' instead of 'more qualified'

---
 clang/test/SemaCXX/virtual-override.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/clang/test/SemaCXX/virtual-override.cpp 
b/clang/test/SemaCXX/virtual-override.cpp
index 003f4826a3d6c86..3a10e15a663a50a 100644
--- a/clang/test/S

[clang-tools-extra] [llvm] [clang] [CodeGen] Revamp counted_by calculations (PR #70606)

2023-11-09 Thread Bill Wendling via cfe-commits

bwendling wrote:

@rapidsna My recent commits try to address a lot of the issues you brought up. 
If the FAM's array index is negative or out of bounds, it should now catch it 
and return an appropriate value. There may still be some corner cases that have 
to be hammered out, but I'd like to get this in if you feel it's ready, as I 
think the corner cases will occur infrequently.

https://github.com/llvm/llvm-project/pull/70606
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread Yingwei Zheng via cfe-commits

dtcxzyw wrote:

Reduced test case: https://godbolt.org/z/d4ETPhbno

https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [AMDGPU] - Add clang builtins for tied WMMA intrinsics (PR #70669)

2023-11-09 Thread Jessica Del via cfe-commits

https://github.com/OutOfCache updated 
https://github.com/llvm/llvm-project/pull/70669

>From 75db77fef715fa5aee10a8384fca299b7bf2b7a3 Mon Sep 17 00:00:00 2001
From: Jessica Del 
Date: Sun, 29 Oct 2023 21:16:52 +0100
Subject: [PATCH] [AMDGPU] - Add clang builtins for tied WMMA intrinsics

Add clang builtins for the new tied wmma intrinsics.
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

Add negative tests for gfx10, since we do not support
the wmma intrinsics before gfx11.
---
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |  4 +++
 clang/lib/CodeGen/CGBuiltin.cpp   | 14 
 .../builtins-amdgcn-wmma-w32-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w32.cl | 30 
 .../builtins-amdgcn-wmma-w64-gfx10-err.cl | 34 +++
 .../CodeGenOpenCL/builtins-amdgcn-wmma-w64.cl | 30 
 6 files changed, 146 insertions(+)
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w32-gfx10-err.cl
 create mode 100644 
clang/test/CodeGenOpenCL/builtins-amdgcn-wmma-w64-gfx10-err.cl

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 532a91fd903e87c..a19c8bd5f219ec6 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -292,6 +292,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w32, 
"V8fV16hV16hV8f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32, "V8fV16sV16sV8f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32, 
"V16hV16hV16hV16hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32, 
"V16sV16sV16sV16sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32, 
"V8iIbV4iIbV4iV8iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w32, 
"V8iIbV2iIbV2iV8iIb", "nc", "gfx11-insts")
 
@@ -299,6 +301,8 @@ TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_f16_w64, 
"V4fV16hV16hV4f", "nc
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64, "V4fV16sV16sV4f", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_w64, "V8hV16hV16hV8hIb", 
"nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64, 
"V8hV16hV16hV8hIb", "nc", "gfx11-insts")
+TARGET_BUILTIN(__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64, 
"V8sV16sV16sV8sIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64, 
"V4iIbV4iIbV4iV4iIb", "nc", "gfx11-insts")
 TARGET_BUILTIN(__builtin_amdgcn_wmma_i32_16x16x16_iu4_w64, 
"V4iIbV2iIbV2iV4iIb", "nc", "gfx11-insts")
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d49c44dbaace3a8..f3c989a76cbc380 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -17936,9 +17936,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
 
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w32:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_w64:
+  case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w32:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_bf16_w64:
   case AMDGPU::BI__builtin_amdgcn_wmma_f32_16x16x16_f16_w32:
@@ -17976,6 +17980,16 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgForMatchingRetType = 2;
   BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16;
   break;
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_f16_16x16x16_f16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied;
+  break;
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w32:
+case AMDGPU::BI__builtin_amdgcn_wmma_bf16_16x16x16_bf16_tied_w64:
+  ArgForMatchingRetType = 2;
+  BuiltinWMMAOp = Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied;
+  break;
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w32:
 case AMDGPU::BI__builtin_amdgcn_wmma_i32_16x16x16_iu8_w64:
   ArgForMatchingRetType = 4;
diff --git a/clang/test/CodeGenOpenCL/bui

[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread Nikita Popov via cfe-commits

nikic wrote:

It looks like simplifyAssocCastAssoc() is the problematic transform. It 
modifies a zext in-place without clearing poison flags.

https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread Nikita Popov via cfe-commits

nikic wrote:

Should be fixed by 
https://github.com/llvm/llvm-project/commit/1b1c81772fe50a1cb2b2adf8d8cf442c0b73602f.

https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)

2023-11-09 Thread via cfe-commits
=?utf-8?q?Bal=C3=A1zs_K=C3=A9ri?= 
Message-ID:
In-Reply-To: 


https://github.com/DonatNagyE approved this pull request.

Thanks for adding the missing TC!

https://github.com/llvm/llvm-project/pull/71373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement inc/dec for IntegralAP (PR #69597)

2023-11-09 Thread Timm Baeder via cfe-commits

https://github.com/tbaederr updated 
https://github.com/llvm/llvm-project/pull/69597

>From be120871fa8486ce9dd6cabb0a0b27d8371896b8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Timm=20B=C3=A4der?= 
Date: Wed, 18 Oct 2023 15:36:13 +0200
Subject: [PATCH] [clang][Interp] Implement inc/dec for IntegralAP

---
 clang/lib/AST/Interp/IntegralAP.h | 12 ++---
 clang/test/AST/Interp/intap.cpp   | 81 ---
 2 files changed, 68 insertions(+), 25 deletions(-)

diff --git a/clang/lib/AST/Interp/IntegralAP.h 
b/clang/lib/AST/Interp/IntegralAP.h
index 88de1f1392e6813..82da79a55b05312 100644
--- a/clang/lib/AST/Interp/IntegralAP.h
+++ b/clang/lib/AST/Interp/IntegralAP.h
@@ -177,17 +177,13 @@ template  class IntegralAP final {
   }
 
   static bool increment(IntegralAP A, IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
-*R = IntegralAP(A.V - 1);
-return false;
+IntegralAP One(1, A.bitWidth());
+return add(A, One, A.bitWidth() + 1, R);
   }
 
   static bool decrement(IntegralAP A, IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
-*R = IntegralAP(A.V - 1);
-return false;
+IntegralAP One(1, A.bitWidth());
+return sub(A, One, A.bitWidth() + 1, R);
   }
 
   static bool add(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) {
diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp
index 34c8d0565082994..73c795732ff1055 100644
--- a/clang/test/AST/Interp/intap.cpp
+++ b/clang/test/AST/Interp/intap.cpp
@@ -43,9 +43,25 @@ namespace APCast {
 }
 
 #ifdef __SIZEOF_INT128__
+typedef __int128 int128_t;
+typedef unsigned __int128 uint128_t;
+static const __uint128_t UINT128_MAX =__uint128_t(__int128_t(-1L));
+static_assert(UINT128_MAX == -1, "");
+static_assert(UINT128_MAX == 1, ""); // expected-error {{static assertion 
failed}} \
+ // expected-note 
{{'340282366920938463463374607431768211455 == 1'}} \
+ // ref-error {{static assertion failed}} \
+ // ref-note 
{{'340282366920938463463374607431768211455 == 1'}}
+
+static const __int128_t INT128_MAX = UINT128_MAX >> (__int128_t)1;
+static_assert(INT128_MAX != 0, "");
+static_assert(INT128_MAX == 0, ""); // expected-error {{failed}} \
+// expected-note {{evaluates to 
'170141183460469231731687303715884105727 == 0'}} \
+// ref-error {{failed}} \
+// ref-note {{evaluates to 
'170141183460469231731687303715884105727 == 0'}}
+static const __int128_t INT128_MIN = -INT128_MAX - 1;
+
 namespace i128 {
-  typedef __int128 int128_t;
-  typedef unsigned __int128 uint128_t;
+
   constexpr int128_t I128_1 = 12;
   static_assert(I128_1 == 12, "");
   static_assert(I128_1 != 10, "");
@@ -54,21 +70,6 @@ namespace i128 {
// expected-note{{evaluates to}} \
// ref-note{{evaluates to}}
 
-  static const __uint128_t UINT128_MAX =__uint128_t(__int128_t(-1L));
-  static_assert(UINT128_MAX == -1, "");
-  static_assert(UINT128_MAX == 1, ""); // expected-error {{static assertion 
failed}} \
-   // expected-note 
{{'340282366920938463463374607431768211455 == 1'}} \
-   // ref-error {{static assertion 
failed}} \
-   // ref-note 
{{'340282366920938463463374607431768211455 == 1'}}
-
-  static const __int128_t INT128_MAX = UINT128_MAX >> (__int128_t)1;
-  static_assert(INT128_MAX != 0, "");
-  static_assert(INT128_MAX == 0, ""); // expected-error {{failed}} \
-  // expected-note {{evaluates to 
'170141183460469231731687303715884105727 == 0'}} \
-  // ref-error {{failed}} \
-  // ref-note {{evaluates to 
'170141183460469231731687303715884105727 == 0'}}
-
-  static const __int128_t INT128_MIN = -INT128_MAX - 1;
   constexpr __int128 A = INT128_MAX + 1; // expected-error {{must be 
initialized by a constant expression}} \
  // expected-note {{value 
170141183460469231731687303715884105728 is outside the range}} \
  // ref-error {{must be initialized by 
a constant expression}} \
@@ -157,4 +158,50 @@ namespace Bitfields {
 // expected-warning {{changes value from 100 to 0}}
 }
 
+namespace IncDec {
+#if __cplusplus >= 201402L
+  constexpr int128_t maxPlus1(bool Pre) {
+int128_t a = INT128_MAX;
+
+if (Pre)
+  ++a; // ref-note {{value 170141183460469231731687303715884105728 is 
outside the range}} \
+   // expected-note {{value 170141183460469231731687303715884105728 is 
outside the range}}
+else
+  a++; // ref-note {{value 170141183460469231731687303715884105728 is 
outside the rang

[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)

2023-11-09 Thread Matthew Devereau via cfe-commits

https://github.com/MDevereau created 
https://github.com/llvm/llvm-project/pull/71795

Adds the builtins:
void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)

Patch by: Kerry McLaughlin 

>From 9846bc9efd79e6e3c2662ea42367c102df88799d Mon Sep 17 00:00:00 2001
From: Matt Devereau 
Date: Thu, 9 Nov 2023 10:50:05 +
Subject: [PATCH] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics

Adds the builtins:

void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)
---
 clang/include/clang/Basic/arm_sme.td  |  5 ++
 clang/include/clang/Basic/arm_sve.td  |  9 
 .../acle_sme2_ldr_str_zt.c| 51 +++
 llvm/include/llvm/IR/IntrinsicsAArch64.td | 11 ++--
 .../Target/AArch64/AArch64ISelDAGToDAG.cpp|  7 ++-
 .../Target/AArch64/AArch64ISelLowering.cpp| 21 
 llvm/lib/Target/AArch64/AArch64ISelLowering.h |  2 +
 .../Target/AArch64/AArch64RegisterInfo.cpp|  6 +++
 .../lib/Target/AArch64/AArch64SMEInstrInfo.td |  4 +-
 llvm/lib/Target/AArch64/SMEInstrFormats.td| 23 +++--
 .../CodeGen/AArch64/sme2-intrinsics-zt0.ll| 27 ++
 11 files changed, 153 insertions(+), 13 deletions(-)
 create mode 100644 
clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
 create mode 100644 llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll

diff --git a/clang/include/clang/Basic/arm_sme.td 
b/clang/include/clang/Basic/arm_sme.td
index b5655afdf419ecf..fe3de56ce3298c5 100644
--- a/clang/include/clang/Basic/arm_sme.td
+++ b/clang/include/clang/Basic/arm_sme.td
@@ -298,3 +298,8 @@ multiclass ZAAddSub {
 
 defm SVADD : ZAAddSub<"add">;
 defm SVSUB : ZAAddSub<"sub">;
+
+let TargetGuard = "sme2" in {
+  def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+  def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+}
diff --git a/clang/include/clang/Basic/arm_sve.td 
b/clang/include/clang/Basic/arm_sve.td
index 3d4c2129565903d..f0b3747898d4145 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", 
"b", MergeNone, "aarch64_
 def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, 
"aarch64_sve_whilewr_h", [IsOverloadWhileRW]>;
 }
 
+// //
+// // Spill and fill of ZT0
+// //
+
+// let TargetGuard = "sme2" in {
+//   def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+//   def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+// }
+
 

 // SVE2 - Extended table lookup/permute
 let TargetGuard = "sve2" in {
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
new file mode 100644
index 000..3d70ded6b469ba1
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1
+#else
+#define SVE_ACLE_FUNC(A

[llvm] [clang] [InstCombine] Infer zext nneg flag (PR #71534)

2023-11-09 Thread via cfe-commits

mikaelholmen wrote:

> Should be fixed by 
> [1b1c817](https://github.com/llvm/llvm-project/commit/1b1c81772fe50a1cb2b2adf8d8cf442c0b73602f).

I've confirmed that the instances of the problem that we saw are fixed by 
1b1c81772fe50a.
Thanks!

https://github.com/llvm/llvm-project/pull/71534
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)

2023-11-09 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Matthew Devereau (MDevereau)


Changes

Adds the builtins:
void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)

Patch by: Kerry McLaughlin 

---
Full diff: https://github.com/llvm/llvm-project/pull/71795.diff


11 Files Affected:

- (modified) clang/include/clang/Basic/arm_sme.td (+5) 
- (modified) clang/include/clang/Basic/arm_sve.td (+9) 
- (added) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
(+51) 
- (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7-4) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+5-2) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+21) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2) 
- (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+6) 
- (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+2-2) 
- (modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+18-5) 
- (added) llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll (+27) 


``diff
diff --git a/clang/include/clang/Basic/arm_sme.td 
b/clang/include/clang/Basic/arm_sme.td
index b5655afdf419ecf..fe3de56ce3298c5 100644
--- a/clang/include/clang/Basic/arm_sme.td
+++ b/clang/include/clang/Basic/arm_sme.td
@@ -298,3 +298,8 @@ multiclass ZAAddSub {
 
 defm SVADD : ZAAddSub<"add">;
 defm SVSUB : ZAAddSub<"sub">;
+
+let TargetGuard = "sme2" in {
+  def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+  def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+}
diff --git a/clang/include/clang/Basic/arm_sve.td 
b/clang/include/clang/Basic/arm_sve.td
index 3d4c2129565903d..f0b3747898d4145 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", 
"b", MergeNone, "aarch64_
 def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, 
"aarch64_sve_whilewr_h", [IsOverloadWhileRW]>;
 }
 
+// //
+// // Spill and fill of ZT0
+// //
+
+// let TargetGuard = "sme2" in {
+//   def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+//   def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+// }
+
 

 // SVE2 - Extended table lookup/permute
 let TargetGuard = "sve2" in {
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
new file mode 100644
index 000..3d70ded6b469ba1
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1
+#else
+#define SVE_ACLE_FUNC(A1,A2) A1##A2
+#endif
+
+// LDR ZT0
+
+// CHECK-LABEL: @test_svldr_zt(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr 
[[BASE:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: @_Z13test_svldr_ztPKv(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr 
[[BASE:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void test

[clang] [llvm] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)

2023-11-09 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Matthew Devereau (MDevereau)


Changes

Adds the builtins:
void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)

Patch by: Kerry McLaughlin 

---
Full diff: https://github.com/llvm/llvm-project/pull/71795.diff


11 Files Affected:

- (modified) clang/include/clang/Basic/arm_sme.td (+5) 
- (modified) clang/include/clang/Basic/arm_sve.td (+9) 
- (added) clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
(+51) 
- (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7-4) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+5-2) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+21) 
- (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+2) 
- (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+6) 
- (modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+2-2) 
- (modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+18-5) 
- (added) llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll (+27) 


``diff
diff --git a/clang/include/clang/Basic/arm_sme.td 
b/clang/include/clang/Basic/arm_sme.td
index b5655afdf419ecf..fe3de56ce3298c5 100644
--- a/clang/include/clang/Basic/arm_sme.td
+++ b/clang/include/clang/Basic/arm_sme.td
@@ -298,3 +298,8 @@ multiclass ZAAddSub {
 
 defm SVADD : ZAAddSub<"add">;
 defm SVSUB : ZAAddSub<"sub">;
+
+let TargetGuard = "sme2" in {
+  def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+  def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+}
diff --git a/clang/include/clang/Basic/arm_sve.td 
b/clang/include/clang/Basic/arm_sve.td
index 3d4c2129565903d..f0b3747898d4145 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", 
"b", MergeNone, "aarch64_
 def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, 
"aarch64_sve_whilewr_h", [IsOverloadWhileRW]>;
 }
 
+// //
+// // Spill and fill of ZT0
+// //
+
+// let TargetGuard = "sme2" in {
+//   def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+//   def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+// }
+
 

 // SVE2 - Extended table lookup/permute
 let TargetGuard = "sve2" in {
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
new file mode 100644
index 000..3d70ded6b469ba1
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1
+#else
+#define SVE_ACLE_FUNC(A1,A2) A1##A2
+#endif
+
+// LDR ZT0
+
+// CHECK-LABEL: @test_svldr_zt(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr 
[[BASE:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: @_Z13test_svldr_ztPKv(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr 
[[BASE:%.*]])
+// CPP-CHECK-NEXT:ret void
+//
+void te

[clang] [clang][Interp] Implement IntegralAP subtraction (PR #71648)

2023-11-09 Thread Timm Baeder via cfe-commits

https://github.com/tbaederr updated 
https://github.com/llvm/llvm-project/pull/71648

>From f1421c190fd480a664bab80281db1e8abb1056a1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Timm=20B=C3=A4der?= 
Date: Wed, 8 Nov 2023 06:49:41 +0100
Subject: [PATCH] [clang][Interp] Implement IntegralAP subtraction

---
 clang/lib/AST/Interp/IntegralAP.h | 32 ---
 clang/test/AST/Interp/intap.cpp   | 15 +++
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/clang/lib/AST/Interp/IntegralAP.h 
b/clang/lib/AST/Interp/IntegralAP.h
index 88de1f1392e6813..b8e37878ce2f848 100644
--- a/clang/lib/AST/Interp/IntegralAP.h
+++ b/clang/lib/AST/Interp/IntegralAP.h
@@ -191,12 +191,11 @@ template  class IntegralAP final {
   }
 
   static bool add(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) {
-return CheckAddUB(A, B, OpBits, R);
+return CheckAddSubUB(A, B, OpBits, R);
   }
 
   static bool sub(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) {
-/// FIXME: Gotta check if the result fits into OpBits bits.
-return CheckSubUB(A, B, R);
+return CheckAddSubUB(A, B, OpBits, R);
   }
 
   static bool mul(IntegralAP A, IntegralAP B, unsigned OpBits, IntegralAP *R) {
@@ -264,28 +263,21 @@ template  class IntegralAP final {
   }
 
 private:
-  static bool CheckAddUB(const IntegralAP &A, const IntegralAP &B,
- unsigned BitWidth, IntegralAP *R) {
-if (!A.isSigned()) {
-  R->V = A.V + B.V;
+  template  class Op>
+  static bool CheckAddSubUB(const IntegralAP &A, const IntegralAP &B,
+unsigned BitWidth, IntegralAP *R) {
+if constexpr (!Signed) {
+  R->V = Op{}(A.V, B.V);
   return false;
 }
 
-const APSInt &LHS = APSInt(A.V, A.isSigned());
-const APSInt &RHS = APSInt(B.V, B.isSigned());
-
-APSInt Value(LHS.extend(BitWidth) + RHS.extend(BitWidth), false);
+const APSInt &LHS = A.toAPSInt();
+const APSInt &RHS = B.toAPSInt();
+APSInt Value = Op{}(LHS.extend(BitWidth), RHS.extend(BitWidth));
 APSInt Result = Value.trunc(LHS.getBitWidth());
-if (Result.extend(BitWidth) != Value)
-  return true;
-
 R->V = Result;
-return false;
-  }
-  static bool CheckSubUB(const IntegralAP &A, const IntegralAP &B,
- IntegralAP *R) {
-R->V = A.V - B.V;
-return false; // Success!
+
+return Result.extend(BitWidth) != Value;
   }
 };
 
diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp
index 34c8d0565082994..c3cae9a64780d5c 100644
--- a/clang/test/AST/Interp/intap.cpp
+++ b/clang/test/AST/Interp/intap.cpp
@@ -11,7 +11,12 @@ constexpr _BitInt(2) B = A + 1;
 constexpr _BitInt(2) C = B + 1; // expected-warning {{from 2 to -2}} \
 // ref-warning {{from 2 to -2}}
 static_assert(C == -2, "");
+static_assert(C - B == A, ""); // expected-error {{not an integral constant 
expression}} \
+   // expected-note {{value -3 is outside the 
range of representable values}} \
+   // ref-error {{not an integral constant 
expression}} \
+   // ref-note {{value -3 is outside the range of 
representable values}}
 
+static_assert(B - 1 == 0, "");
 
 constexpr MaxBitInt A_ = 0;
 constexpr MaxBitInt B_ = A_ + 1;
@@ -130,6 +135,16 @@ namespace i128 {
// expected-warning {{implicit 
conversion of out of range value}} \
// expected-error {{must be 
initialized by a constant expression}} \
// expected-note {{is outside the 
range of representable values of type}}
+
+  constexpr uint128_t Zero = 0;
+  static_assert((Zero -1) == -1, "");
+  constexpr int128_t Five = 5;
+  static_assert(Five - Zero == Five, "");
+
+  constexpr int128_t Sub1 = INT128_MIN - 1; // expected-error {{must be 
initialized by a constant expression}} \
+// expected-note 
{{-170141183460469231731687303715884105729 is outside the range}} \
+// ref-error {{must be initialized 
by a constant expression}} \
+// ref-note 
{{-170141183460469231731687303715884105729 is outside the range}}
 }
 
 namespace AddSubOffset {

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement IntegralAP subtraction (PR #71648)

2023-11-09 Thread Timm Baeder via cfe-commits

tbaederr wrote:

Tests should work now

https://github.com/llvm/llvm-project/pull/71648
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement builtin_expect (PR #69713)

2023-11-09 Thread Timm Baeder via cfe-commits
Timm =?utf-8?q?B=C3=A4der?= 
Message-ID:
In-Reply-To: 


tbaederr wrote:

Ping

https://github.com/llvm/llvm-project/pull/69713
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jhuber6 wrote:

If there were any global ctors / dtors the backend will emit a kernel. This is 
simply encoding "Does this symbol exist? If not continue on". We check the ELF 
symbol table directly as it's more efficient than going through the device API.

We probably need to encode the logic better, since `consumeError` is a bit of a 
code smell. Maybe a helper function like `Handler.hasGlobal` or something.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();
+}
+
+// Allocate and construct the AMDGPU kernel.
+GenericKernelTy *AMDGPUKernel = Plugin.allocate();
+if (!AMDGPUKernel)
+  return Plugin::error("Failed to allocate memory for AMDGPU kernel");
+
+new (AMDGPUKernel) AMDGPUKernelTy(Name);
+if (auto Err = AMDGPUKernel->initImpl(*this, Image))
+  return std::move(Err);
+
+auto *AsyncInfoPtr = Plugin.allocate<__tgt_async_info>();
+AsyncInfoWrapperTy AsyncInfoWrapper(*this, AsyncInfoPtr);
+
+if (auto Err = initAsyncInfoImpl(AsyncInfoWrapper))
+  return std::move(Err);
+
+KernelArgsTy KernelArgs = {};
+if (auto Err = AMDGPUKernel->launchImpl(*this, /*NumThread=*/1u,
+/*NumBlocks=*/1ul, KernelArgs,
+/*Args=*/nullptr, 
AsyncInfoWrapper))
+  return std::move(Err);
+
+if (auto Err = synchronize(AsyncInfoPtr))
+  return std::move(Err);
+Error Err = Error::success();

jhuber6 wrote:

Yes

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 0a1f4b5d514a5e1525e3178a80f6e8f5638bfb69 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   8 ++
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 291 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }
 
+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;
 
-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emi

[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)

2023-11-09 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 18bb9725619569687bec2c013768511105266a5e 
9846bc9efd79e6e3c2662ea42367c102df88799d -- 
clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
llvm/lib/Target/AArch64/AArch64ISelLowering.h 
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index c011a46cf02a..abfe14e52509 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -326,7 +326,8 @@ public:
 return false;
   }
 
-  template  bool ImmToTile(SDValue N, SDValue 
&Imm) {
+  template 
+  bool ImmToTile(SDValue N, SDValue &Imm) {
 if (auto *CI = dyn_cast(N)) {
   uint64_t C = CI->getZExtValue();
 
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c6ff3f1ce6a3..7404e04b8ea2 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2754,12 +2754,11 @@ MachineBasicBlock 
*AArch64TargetLowering::EmitZTSpillFill(MachineInstr &MI,
   if (IsSpill) {
 MIB = BuildMI(*BB, MI, MI.getDebugLoc(), TII->get(AArch64::STR_TX));
 MIB.addReg(MI.getOperand(0).getReg());
-  }
-  else
+  } else
 MIB = BuildMI(*BB, MI, MI.getDebugLoc(), TII->get(AArch64::LDR_TX),
   MI.getOperand(0).getReg());
   MIB.add(MI.getOperand(1)); // Base
-  MI.eraseFromParent(); // The pseudo is gone now.
+  MI.eraseFromParent();  // The pseudo is gone now.
   return BB;
 }
 
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index af2181c0791b..0b4dde5e4d19 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -442,7 +442,7 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const 
MachineFunction &MF) const {
 
   if (MF.getSubtarget().hasSME2()) {
 for (MCSubRegIterator SubReg(AArch64::ZT0, this, /*self=*/true);
-  SubReg.isValid(); ++SubReg)
+ SubReg.isValid(); ++SubReg)
   Reserved.set(*SubReg);
   }
 

``




https://github.com/llvm/llvm-project/pull/71795
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang] add fveclib flag (PR #71734)

2023-11-09 Thread Kiran Chandramohan via cfe-commits


@@ -81,6 +81,17 @@ class CodeGenOptions : public CodeGenOptionsBase {
 RK_WithPattern, // Remark pattern specified via '-Rgroup=regexp'.
   };
 
+  enum class VectorLibrary {
+NoLibrary,  // Don't use any vector library.
+Accelerate, // Use the Accelerate framework.
+LIBMVEC,// GLIBC vector math library.
+MASSV,  // IBM MASS vector library.
+SVML,   // Intel short vector math library.
+SLEEF,  // SLEEF SIMD Library for Evaluating Elementary Functions.
+Darwin_libsystem_m, // Use Darwin's libsystem_m vector functions.
+ArmPL   // Arm Performance Libraries.
+  };

kiranchandramohan wrote:

Can this class be moved to a file in a new directory 
`llvm/include/llvm/Frontend/Driver` and shared with Clang?

https://github.com/llvm/llvm-project/pull/71734
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[flang] [clang] [flang] add fveclib flag (PR #71734)

2023-11-09 Thread Kiran Chandramohan via cfe-commits


@@ -843,6 +843,44 @@ getOutputStream(CompilerInstance &ci, llvm::StringRef 
inFile,
   llvm_unreachable("Invalid action!");
 }
 
+static std::unique_ptr
+createTLII(llvm::Triple &targetTriple, const CodeGenOptions &codeGenOpts) {
+  auto tlii = std::make_unique(targetTriple);
+  assert(tlii && "Failed to create TargetLibraryInfo");
+
+  using VecLib = llvm::TargetLibraryInfoImpl::VectorLibrary;
+  VecLib vecLib = VecLib::NoLibrary;
+  switch (codeGenOpts.getVecLib()) {
+  case CodeGenOptions::VectorLibrary::Accelerate:
+vecLib = VecLib::Accelerate;
+break;
+  case CodeGenOptions::VectorLibrary::LIBMVEC:
+vecLib = VecLib::LIBMVEC_X86;
+break;
+  case CodeGenOptions::VectorLibrary::MASSV:
+vecLib = VecLib::MASSV;
+break;
+  case CodeGenOptions::VectorLibrary::SVML:
+vecLib = VecLib::SVML;
+break;
+  case CodeGenOptions::VectorLibrary::SLEEF:
+vecLib = VecLib::SLEEFGNUABI;
+break;
+  case CodeGenOptions::VectorLibrary::Darwin_libsystem_m:
+vecLib = VecLib::DarwinLibSystemM;
+break;
+  case CodeGenOptions::VectorLibrary::ArmPL:
+vecLib = VecLib::ArmPL;
+break;
+  case CodeGenOptions::VectorLibrary::NoLibrary:
+vecLib = VecLib::NoLibrary;
+break;
+  }
+
+  tlii->addVectorizableFunctionsFromVecLib(vecLib, targetTriple);
+  return tlii;
+}

kiranchandramohan wrote:

Can this code be moved to `llvm/lib/Frontend/Driver` and shared with Clang?

https://github.com/llvm/llvm-project/pull/71734
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits

https://github.com/jplehr commented:

Thanks Joseph. Another two nits.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy &Plugin,
+   DeviceImageTy &Image) {
+return Error::success();
+  }
+
+  /// Invokes any global destructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalDestructors(GenericPluginTy &Plugin,
+  DeviceImageTy &Image) {
+return Error::success();

jplehr wrote:

Plugin::success()

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy &Plugin,
+   DeviceImageTy &Image) {
+return Error::success();

jplehr wrote:

Plugin::success()

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Jan Patrick Lehr via cfe-commits


@@ -2627,6 +2637,48 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, 
AMDGenericDeviceTy {
   using AMDGPUEventRef = AMDGPUResourceRef;
   using AMDGPUEventManagerTy = GenericDeviceResourceManagerTy;
 
+  /// Common method to invoke a single threaded constructor or destructor
+  /// kernel by name.
+  Error callGlobalCtorDtorCommon(GenericPluginTy &Plugin, DeviceImageTy &Image,
+ const char *Name) {
+// Perform a quick check for the named kernel in the image. The kernel
+// should be created by the 'amdgpu-lower-ctor-dtor' pass.
+GenericGlobalHandlerTy &Handler = Plugin.getGlobalHandler();
+GlobalTy Global(Name, sizeof(void *));
+if (auto Err = Handler.getGlobalMetadataFromImage(*this, Image, Global)) {
+  consumeError(std::move(Err));
+  return Error::success();

jplehr wrote:

That would certainly make it more obvious.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement __builtin_bit_cast (PR #68288)

2023-11-09 Thread Timm Baeder via cfe-commits
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= ,
Timm =?utf-8?q?Bäder?= 
Message-ID:
In-Reply-To: 



@@ -0,0 +1,816 @@
+// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only 
-fexperimental-new-constant-interpreter %s
+// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only %s
+// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only -triple 
aarch64_be-linux-gnu -fexperimental-new-constant-interpreter %s
+// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple 
aarch64_be-linux-gnu %s
+// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only 
-fexperimental-new-constant-interpreter -triple powerpc64le-unknown-unknown 
-mabi=ieeelongdouble %s
+// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple 
powerpc64le-unknown-unknown -mabi=ieeelongdouble %s
+// RUN: %clang_cc1 -verify -std=c++2a -fsyntax-only 
-fexperimental-new-constant-interpreter -triple powerpc64-unknown-unknown 
-mabi=ieeelongdouble %s
+// RUN: %clang_cc1 -verify=ref -std=c++2a -fsyntax-only -triple 
powerpc64-unknown-unknown -mabi=ieeelongdouble %s
+
+/// FIXME: This is a version of
+///   clang/test/SemaCXX/constexpr-builtin-bit-cast.cpp with the currently
+///   supported subset of operations. They should *all* be supported though.
+
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#  define LITTLE_END 1
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#  define LITTLE_END 0
+#else
+#  error "huh?"
+#endif
+
+typedef decltype(nullptr) nullptr_t;
+
+
+
+static_assert(sizeof(int) == 4);
+static_assert(sizeof(long long) == 8);
+
+template 
+constexpr To bit_cast(const From &from) {
+  static_assert(sizeof(To) == sizeof(From));
+  return __builtin_bit_cast(To, from); // ref-note 2{{indeterminate value can 
only initialize}} \
+   // expected-note 2{{indeterminate value 
can only initialize}} \
+   // ref-note {{subexpression not valid}}
+}
+
+
+/// Current interpreter does not support this.
+/// https://github.com/llvm/llvm-project/issues/63686
+constexpr int FromString = bit_cast("abc"); // ref-error {{must be 
initialized by a constant expression}} \
+ // ref-note {{in call to}} \
+ // ref-note {{declared here}}
+#if LITTLE_END
+static_assert(FromString == 6513249); // ref-error {{is not an integral 
constant expression}} \
+  // ref-note {{initializer of 
'FromString' is not a constant expression}}
+#else
+static_assert(FromString == 1633837824); // ref-error {{is not an integral 
constant expression}} \

tbaederr wrote:

TIL that `constinit` variables aren't usable in constant expressions. But 
otherwise the test works.

https://github.com/llvm/llvm-project/pull/68288
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -671,6 +671,20 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   Error synchronize(__tgt_async_info *AsyncInfo);
   virtual Error synchronizeImpl(__tgt_async_info &AsyncInfo) = 0;
 
+  /// Invokes any global constructors on the device if present and is required
+  /// by the target.
+  virtual Error callGlobalConstructors(GenericPluginTy &Plugin,
+   DeviceImageTy &Image) {
+return Error::success();

jhuber6 wrote:

This code is in the header above the definition of the `Plugin` class, so we 
can't use that without a complete reordering.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [clang] [llvm] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

arsenm wrote:

Oh look, it's both of my favorite patterns. Can you refine this into something 
better than language X | language Y and AMDGPU || PTX 

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

Yeah, these types of things are problematic especially if we consider getting 
SPIR-V support eventually. The logic basically goes like this. OpenMP supports 
global destructors but does not always support the `atexit` function. The old 
logic used to replace everything. This now at least lets CPU based targets use 
regular handling. I could make this unconditional for OpenMP, but I figured 
it'd be better to allow the CPU based targets to use the regular handling.

More or less this is just a concession to prevent regressions from this patch. 
The old logic looked like this, which did this unconditionally. Like I said, 
could remove the AMD and PTX checks and just do this on the CPU as well if it 
would be better.
```c++
  if (CGM.getLangOpts().OMPTargetTriples.empty() &&
  !CGM.getLangOpts().OpenMPIsTargetDevice)
return false;
```

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [openmp] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From 5283c5e08877b11a0eece51ca3877c9f5f8c7b82 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 18 files changed, 290 insertions(+), 216 deletions(-)

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }
 
+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;
 
-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1747,136 +1747,6 @@ llvm::Function 
*CGOpenMPRuntime::emit

[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

Just make this apply to all triples. I don't want to remove the dependency on 
the OpenMP language because this is somewhat of a hack. We can revisit this 
later if needed.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[compiler-rt] [llvm] [clang-tools-extra] [clang] [InferAddressSpaces] Fix constant replace to avoid modifying other functions (PR #70611)

2023-11-09 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request.

I think it would be better if we could eliminate ConstantExpr addrspacecasts 
from the IR altogether, which would avoid most of the complexity here. I would 
also somewhat prefer to push this DFS into a helper function, but can live with 
it inline as-is

https://github.com/llvm/llvm-project/pull/70611
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)

2023-11-09 Thread via cfe-commits

https://github.com/mihailo-stojanovic created 
https://github.com/llvm/llvm-project/pull/71803

Fix the issue where Baremetal toolchain is created instead of the 
RISCVToolchain when GCC installation is explicitly passed via the 
gcc-install-dir option.

>From cd5e6d82eb0eb0431f38c48a800c1951d8d4b343 Mon Sep 17 00:00:00 2001
From: Mihailo Stojanovic 
Date: Tue, 19 Sep 2023 14:30:00 +0300
Subject: [PATCH] [clang][RISCV]: Enable --gcc-install-dir for bare metal
 targets

Fix the issue where Baremetal toolchain is created instead of
the RISCVToolchain when GCC installation is explicitly passed
via the gcc-install-dir option.
---
 clang/lib/Driver/ToolChains/RISCVToolchain.cpp   |  3 +++
 .../riscv64-unknown-elf/include/c++/8.2.0/.keep  |  0
 .../include/c++/8.2.0/backward/.keep |  0
 clang/test/Driver/gcc-install-dir.cpp| 12 
 4 files changed, 15 insertions(+)
 create mode 100644 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 create mode 100644 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep

diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp 
b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
index 7e6abd144428783..6b27ea224eb02ee 100644
--- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
+++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
@@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D,
   if (Args.getLastArg(options::OPT_gcc_toolchain))
 return true;
 
+  if (Args.getLastArg(options::OPT_gcc_install_dir_EQ))
+return true;
+
   SmallString<128> GCCDir;
   llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(),
   "lib/crt0.o");
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git a/clang/test/Driver/gcc-install-dir.cpp 
b/clang/test/Driver/gcc-install-dir.cpp
index 955f162a2ce3a19..d22ca545508370d 100644
--- a/clang/test/Driver/gcc-install-dir.cpp
+++ b/clang/test/Driver/gcc-install-dir.cpp
@@ -37,6 +37,18 @@
 // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32"
 // DEBIAN_X86_64_M32-SAME: {{^}} 
"-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32"
 
+/// Test GCC installation on bare-metal RISCV64.
+// RUN: %clang -### %s --target=riscv64-unknown-elf 
--sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ 
--stdlib=platform --rtlib=platform \
+// RUN:   
--gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/
 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ELF_RISCV64
+// ELF_RISCV64:  "-internal-isystem"
+// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/backward"
+// ELF_RISCV64:  "-L
+// ELF_RISCV64-SAME: 
{{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} 
"-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64"
+
 // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu 
--sysroot=%S/Inputs/debian_multiarch_tree \
 // RUN:   -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin 
-resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \
 // RUN:   
--gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 
2>&1 | FileCheck %s --check-prefix=INVALID

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)

2023-11-09 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: None (mihailo-stojanovic)


Changes

Fix the issue where Baremetal toolchain is created instead of the 
RISCVToolchain when GCC installation is explicitly passed via the 
gcc-install-dir option.

---
Full diff: https://github.com/llvm/llvm-project/pull/71803.diff


4 Files Affected:

- (modified) clang/lib/Driver/ToolChains/RISCVToolchain.cpp (+3) 
- (added) 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 () 
- (added) 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
 () 
- (modified) clang/test/Driver/gcc-install-dir.cpp (+12) 


``diff
diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp 
b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
index 7e6abd144428783..6b27ea224eb02ee 100644
--- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
+++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
@@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D,
   if (Args.getLastArg(options::OPT_gcc_toolchain))
 return true;
 
+  if (Args.getLastArg(options::OPT_gcc_install_dir_EQ))
+return true;
+
   SmallString<128> GCCDir;
   llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(),
   "lib/crt0.o");
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git a/clang/test/Driver/gcc-install-dir.cpp 
b/clang/test/Driver/gcc-install-dir.cpp
index 955f162a2ce3a19..d22ca545508370d 100644
--- a/clang/test/Driver/gcc-install-dir.cpp
+++ b/clang/test/Driver/gcc-install-dir.cpp
@@ -37,6 +37,18 @@
 // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32"
 // DEBIAN_X86_64_M32-SAME: {{^}} 
"-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32"
 
+/// Test GCC installation on bare-metal RISCV64.
+// RUN: %clang -### %s --target=riscv64-unknown-elf 
--sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ 
--stdlib=platform --rtlib=platform \
+// RUN:   
--gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/
 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ELF_RISCV64
+// ELF_RISCV64:  "-internal-isystem"
+// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/backward"
+// ELF_RISCV64:  "-L
+// ELF_RISCV64-SAME: 
{{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} 
"-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64"
+
 // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu 
--sysroot=%S/Inputs/debian_multiarch_tree \
 // RUN:   -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin 
-resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \
 // RUN:   
--gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 
2>&1 | FileCheck %s --check-prefix=INVALID

``




https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -113,7 +120,7 @@ class EmitAssemblyHelper {
   const CodeGenOptions &CodeGenOpts;
   const clang::TargetOptions &TargetOpts;
   const LangOptions &LangOpts;
-  Module *TheModule;
+  llvm::Module *TheModule;

arsenm wrote:

Why did this suddenly need qualification?

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses;
 static cl::opt ClSanitizeOnOptimizerEarlyEP(
 "sanitizer-early-opt-ep", cl::Optional,
 cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false));
+
+// Re-link builtin bitcodes after optimization
+static cl::opt ClRelinkBuiltinBitcodePostop(
+"relink-builtin-bitcode-postop", cl::Optional,
+cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));

arsenm wrote:

Not a proper flag? Where/how is -mlink-builtin-bitcode defined?

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   return HasVMemLoad && UsesVgprLoadedOutside;
 }
 
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+  bool Modified = false;
+
+  for (auto &MBB : MF) {

arsenm wrote:

I think it makes it harder to reason about the pass as a whole to have it as a 
totally separate phase

https://github.com/llvm/llvm-project/pull/68932
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [CodeGen] Implement post-opt linking option for builtin bitocdes (PR #69371)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -98,6 +100,11 @@ extern cl::opt PrintPipelinePasses;
 static cl::opt ClSanitizeOnOptimizerEarlyEP(
 "sanitizer-early-opt-ep", cl::Optional,
 cl::desc("Insert sanitizers on OptimizerEarlyEP."), cl::init(false));
+
+// Re-link builtin bitcodes after optimization
+static cl::opt ClRelinkBuiltinBitcodePostop(
+"relink-builtin-bitcode-postop", cl::Optional,
+cl::desc("Re-link builtin bitcodes after optimization."), cl::init(false));

jhuber6 wrote:

That's a clang flag, this is presumably more of an LLVM one because this added 
a new pass that lives in Clang. I still think the solution to this was to just 
stop the backend from doing this optimization if it will obviously break it, 
but supposedly that caused performance regressions.

https://github.com/llvm/llvm-project/pull/69371
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #68932)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -1809,6 +1816,23 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   return HasVMemLoad && UsesVgprLoadedOutside;
 }
 
+bool SIInsertWaitcnts::insertWaitcntAfterMemOp(MachineFunction &MF) {
+  bool Modified = false;
+
+  for (auto &MBB : MF) {

arsenm wrote:

Plus I think the two separate, but closely related cl::opts is confusing 

https://github.com/llvm/llvm-project/pull/68932
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)

2023-11-09 Thread via cfe-commits

https://github.com/mihailo-stojanovic closed 
https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)

2023-11-09 Thread via cfe-commits

https://github.com/mihailo-stojanovic reopened 
https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][RISCV]: Enable --gcc-install-dir for bare metal targets (PR #71803)

2023-11-09 Thread via cfe-commits

https://github.com/mihailo-stojanovic updated 
https://github.com/llvm/llvm-project/pull/71803

>From 3c73fdf962c2e4fc8d993a34595f21a3926710d0 Mon Sep 17 00:00:00 2001
From: Mihailo Stojanovic 
Date: Tue, 19 Sep 2023 14:30:00 +0300
Subject: [PATCH] [clang] Enable --gcc-install-dir for RISCV baremetal
 toolchains

Fix the issue where Baremetal toolchain is created instead of
the RISCVToolchain when GCC installation is explicitly passed
via the gcc-install-dir option.
---
 clang/lib/Driver/ToolChains/RISCVToolchain.cpp   |  3 +++
 .../riscv64-unknown-elf/include/c++/8.2.0/.keep  |  0
 .../include/c++/8.2.0/backward/.keep |  0
 clang/test/Driver/gcc-install-dir.cpp| 12 
 4 files changed, 15 insertions(+)
 create mode 100644 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 create mode 100644 
clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep

diff --git a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp 
b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
index 7e6abd144428783..6b27ea224eb02ee 100644
--- a/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
+++ b/clang/lib/Driver/ToolChains/RISCVToolchain.cpp
@@ -40,6 +40,9 @@ bool RISCVToolChain::hasGCCToolchain(const Driver &D,
   if (Args.getLastArg(options::OPT_gcc_toolchain))
 return true;
 
+  if (Args.getLastArg(options::OPT_gcc_install_dir_EQ))
+return true;
+
   SmallString<128> GCCDir;
   llvm::sys::path::append(GCCDir, D.Dir, "..", D.getTargetTriple(),
   "lib/crt0.o");
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git 
a/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
 
b/clang/test/Driver/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/include/c++/8.2.0/backward/.keep
new file mode 100644
index 000..e69de29bb2d1d64
diff --git a/clang/test/Driver/gcc-install-dir.cpp 
b/clang/test/Driver/gcc-install-dir.cpp
index 955f162a2ce3a19..d22ca545508370d 100644
--- a/clang/test/Driver/gcc-install-dir.cpp
+++ b/clang/test/Driver/gcc-install-dir.cpp
@@ -37,6 +37,18 @@
 // DEBIAN_X86_64_M32-SAME: {{^}}[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/32"
 // DEBIAN_X86_64_M32-SAME: {{^}} 
"-L[[SYSROOT]]/usr/lib/gcc/x86_64-linux-gnu/10/../../../../lib32"
 
+/// Test GCC installation on bare-metal RISCV64.
+// RUN: %clang -### %s --target=riscv64-unknown-elf 
--sysroot=%S/Inputs/multilib_riscv_elf_sdk/riscv64-unknown-elf/ 
--stdlib=platform --rtlib=platform \
+// RUN:   
--gcc-install-dir=%S/Inputs/multilib_riscv_elf_sdk/lib/gcc/riscv64-unknown-elf/8.2.0/
 2>&1 \
+// RUN:   | FileCheck %s --check-prefix=ELF_RISCV64
+// ELF_RISCV64:  "-internal-isystem"
+// ELF_RISCV64-SAME: {{^}} "[[SYSROOT:[^"]+]]/include/c++/8.2.0"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/riscv64-unknown-elf/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} "-internal-isystem" 
"[[SYSROOT]]/include/c++/8.2.0/backward"
+// ELF_RISCV64:  "-L
+// ELF_RISCV64-SAME: 
{{^}}[[SYSROOT:[^"]+]]/lib/gcc/riscv64-unknown-elf/8.2.0/rv64imac/lp64"
+// ELF_RISCV64-SAME: {{^}} 
"-L[[SYSROOT]]/lib/gcc/riscv64-unknown-elf/8.2.0/../../../../riscv64-unknown-elf/lib/rv64imac/lp64"
+
 // RUN: not %clangxx %s -### --target=x86_64-unknown-linux-gnu 
--sysroot=%S/Inputs/debian_multiarch_tree \
 // RUN:   -ccc-install-dir %S/Inputs/basic_linux_tree/usr/bin 
-resource-dir=%S/Inputs/resource_dir --stdlib=platform --rtlib=platform \
 // RUN:   
--gcc-install-dir=%S/Inputs/debian_multiarch_tree/usr/lib/gcc/x86_64-linux-gnu 
2>&1 | FileCheck %s --check-prefix=INVALID

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)

2023-11-09 Thread via cfe-commits

https://github.com/mihailo-stojanovic edited 
https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Just noticed I'm actually calling the destructors backwards in AMDGPU. Will fix 
that.

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Matt Arsenault via cfe-commits


@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

arsenm wrote:

Would also just hide this in a target/lang predicate that lists these 

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)

2023-11-09 Thread Kito Cheng via cfe-commits

https://github.com/kito-cheng approved this pull request.

Checked with `Generic_GCC::GCCInstallationDetector::init` to make sure clang 
will use that to search gcc toolchain, so LGTM.

https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [openmp] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits


@@ -2794,6 +2794,14 @@ void ItaniumCXXABI::registerGlobalDtor(CodeGenFunction 
&CGF, const VarDecl &D,
   if (D.isNoDestroy(CGM.getContext()))
 return;
 
+  // OpenMP offloading supports C++ constructors and destructors but we do not
+  // always have 'atexit' available. Instead lower these to use the LLVM global
+  // destructors which we can handle directly in the runtime.
+  if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsTargetDevice &&
+  !D.isStaticLocal() &&
+  (CGM.getTriple().isAMDGPU() || CGM.getTriple().isNVPTX()))

jhuber6 wrote:

So just some random helper function like "Does target support X?"

https://github.com/llvm/llvm-project/pull/71739
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits

https://github.com/CarolineConcatto edited 
https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in {
   def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], 
MemEltTyDefault, "aarch64_sve_stnt1">;
 }
 
+let TargetGuard = "sve2p1" in {
+  // Contiguous truncating store from quadword (single vector).
+  def SVST1UWQ  : MInst<"svst1uwq[_{d}]", "vPcd", "iUif",  [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+  def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+
+  def SVST1UDQ  : MInst<"svst1udq[_{d}]", "vPcd", "lUld",  [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+  def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+
+  // Store one vector (vector base + scalar offset)
+  def SVST1Q_SCATTER_U64BASE_OFFSET : 
MInst<"svst1q_scatter[_{2}base]_offset[_{d}]",  "vPgld", "cUcsUsiUilUlfhdb", 
[IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+  def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]",  
"vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+
+  // Store one vector (scalar base + vector offset)
+  def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", 
"vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_vector_offset">;

CarolineConcatto wrote:


s/svst1q_scatter_[{3}]offset[_{0}]/svst1q_scatter_[{3}]offset[_{d}]/

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -298,6 +298,38 @@ let TargetGuard = "sve,bf16" in {
   def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, 
"aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
 }
 
+let TargetGuard = "sve2p1" in {
+  // Contiguous zero-extending load to quadword (single vector).
+  def SVLD1UWQ  : MInst<"svld1uwq[_{d}]", "dPc",  "iUif", [IsLoad], 
MemEltTyInt32, "aarch64_sve_ld1uwq">;
+  def SVLD1UWQ_VNUM : MInst<"svld1uwq_vnum[_{d}]", "dPcl", "iUif", [IsLoad], 
MemEltTyInt32, "aarch64_sve_ld1uwq">;
+
+  def SVLD1UDQ  : MInst<"svld1udq[_{d}]", "dPc",  "lUld", [IsLoad], 
MemEltTyInt64, "aarch64_sve_ld1udq">;
+  def SVLD1UDQ_VNUM : MInst<"svld1udq_vnum[_{d}]", "dPcl", "lUld", [IsLoad], 
MemEltTyInt64, "aarch64_sve_ld1udq">;
+
+  // Load one vector (vector base + scalar offset)
+  def SVLD1Q_GATHER_U64BASE_OFFSET : 
MInst<"svld1q_gather[_{2}base]_offset_{d}", "dPgl", "cUcsUsiUilUlfhdb", 
[IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_scalar_offset">;
+  def SVLD1Q_GATHER_U64BASE : MInst<"svld1q_gather[_{2}base]_{d}", "dPg", 
"cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_scalar_offset">;
+
+  // Load one vector (scalar base + vector offset)
+  def SVLD1Q_GATHER_U64OFFSET : MInst<"svld1q_gather_[{3}]offset[_{0}]", 
"dPcg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_vector_offset">;
+
+  // Load N-element structure into N vectors (scalar base)
+  defm SVLD2Q : StructLoad<"svld2q[_{2}]", "2Pc", "aarch64_sve_ld2q_sret">;
+  defm SVLD3Q : StructLoad<"svld3q[_{2}]", "3Pc", "aarch64_sve_ld3q_sret">;
+  defm SVLD4Q : StructLoad<"svld4q[_{2}]", "4Pc", "aarch64_sve_ld4q_sret">;
+
+  // Load N-element structure into N vectors (scalar base, VL displacement)
+  defm SVLD2Q_VNUM : StructLoad<"svld2q_vnum[_{2}]", "2Pcl", 
"aarch64_sve_ld2q_sret">;
+  defm SVLD3Q_VNUM : StructLoad<"svld3q_vnum[_{2}]", "3Pcl", 
"aarch64_sve_ld3q_sret">;
+  defm SVLD4Q_VNUM : StructLoad<"svld4q_vnum[_{2}]", "4Pcl", 
"aarch64_sve_ld4q_sret">;
+
+  // Load quadwords (scalar base + vector index)
+  def SVLD1Q_GATHER_INDICES_U : MInst<"svld1q_gather_[{3}]index[_{0}]",
"dPcg", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_index">;
+
+  // Load quadwords (vector base + scalar index)
+  def SVLD1Q_GATHER_INDEX_S   : MInst<"svld1q_gather[_{2}base]_index_{0}", 
"dPgl", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_scalar_offset">;

CarolineConcatto wrote:

s/svld1q_gather[_{2}base]_index_{0}/svld1q_gather[_{2}base]_index_{d}

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -9497,8 +9500,11 @@ Value *CodeGenFunction::EmitSVEScatterStore(const 
SVETypeFlags &TypeFlags,
   // mapped to . However, this might be incompatible with the
   // actual type being stored. For example, when storing doubles (i64) the
   // predicated should be  instead. At the IR level the type of
-  // the predicate and the data being stored must match. Cast accordingly.
-  Ops[1] = EmitSVEPredicateCast(Ops[1], OverloadedTy);
+  // the predicate and the data being stored must match. Cast to the type
+  // expected by the intrinsic. The intrinsic itself should be defined in
+  // a way that enforces relations between parameter types.
+  Ops[1] = EmitSVEPredicateCast(
+  Ops[1], cast(F->getArg(1)->getType()));

CarolineConcatto wrote:

Is this correct? F->getArg(1), is the predicated type, no? Arg[0] = void, 
Arg[1]= predicate
AFAIU we did not shifted the Function arguments.
When we do this:   Ops.insert(Ops.begin(), Ops.pop_back_val());, does this also 
shifts F->getArg?


https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -298,6 +298,38 @@ let TargetGuard = "sve,bf16" in {
   def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, 
"aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
 }
 
+let TargetGuard = "sve2p1" in {
+  // Contiguous zero-extending load to quadword (single vector).
+  def SVLD1UWQ  : MInst<"svld1uwq[_{d}]", "dPc",  "iUif", [IsLoad], 
MemEltTyInt32, "aarch64_sve_ld1uwq">;
+  def SVLD1UWQ_VNUM : MInst<"svld1uwq_vnum[_{d}]", "dPcl", "iUif", [IsLoad], 
MemEltTyInt32, "aarch64_sve_ld1uwq">;
+
+  def SVLD1UDQ  : MInst<"svld1udq[_{d}]", "dPc",  "lUld", [IsLoad], 
MemEltTyInt64, "aarch64_sve_ld1udq">;
+  def SVLD1UDQ_VNUM : MInst<"svld1udq_vnum[_{d}]", "dPcl", "lUld", [IsLoad], 
MemEltTyInt64, "aarch64_sve_ld1udq">;
+
+  // Load one vector (vector base + scalar offset)
+  def SVLD1Q_GATHER_U64BASE_OFFSET : 
MInst<"svld1q_gather[_{2}base]_offset_{d}", "dPgl", "cUcsUsiUilUlfhdb", 
[IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_scalar_offset">;
+  def SVLD1Q_GATHER_U64BASE : MInst<"svld1q_gather[_{2}base]_{d}", "dPg", 
"cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_scalar_offset">;
+
+  // Load one vector (scalar base + vector offset)
+  def SVLD1Q_GATHER_U64OFFSET : MInst<"svld1q_gather_[{3}]offset[_{0}]", 
"dPcg", "cUcsUsiUilUlfhdb", [IsGatherLoad, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_vector_offset">;
+
+  // Load N-element structure into N vectors (scalar base)
+  defm SVLD2Q : StructLoad<"svld2q[_{2}]", "2Pc", "aarch64_sve_ld2q_sret">;
+  defm SVLD3Q : StructLoad<"svld3q[_{2}]", "3Pc", "aarch64_sve_ld3q_sret">;
+  defm SVLD4Q : StructLoad<"svld4q[_{2}]", "4Pc", "aarch64_sve_ld4q_sret">;
+
+  // Load N-element structure into N vectors (scalar base, VL displacement)
+  defm SVLD2Q_VNUM : StructLoad<"svld2q_vnum[_{2}]", "2Pcl", 
"aarch64_sve_ld2q_sret">;
+  defm SVLD3Q_VNUM : StructLoad<"svld3q_vnum[_{2}]", "3Pcl", 
"aarch64_sve_ld3q_sret">;
+  defm SVLD4Q_VNUM : StructLoad<"svld4q_vnum[_{2}]", "4Pcl", 
"aarch64_sve_ld4q_sret">;
+
+  // Load quadwords (scalar base + vector index)
+  def SVLD1Q_GATHER_INDICES_U : MInst<"svld1q_gather_[{3}]index[_{0}]",
"dPcg", "sUsiUilUlbhfd", [IsGatherLoad], MemEltTyDefault, 
"aarch64_sve_ld1q_gather_index">;

CarolineConcatto wrote:

nit: remove the extra space before "dPcg"
Just in case, here we could also write as:
svld1q_gather_[{3}]index[_{d}], both are correct because position 0 is 'd' in 
"dPcg", that is [default](d: default)
But my opinion would be to replace what you have and do:
s/svld1q_gather_[{3}]index[_{0}]/svld1q_gather_[{3}]index[_{d}]/g
And do the same for SVLD1Q_GATHER_INDEX_S 

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits

https://github.com/CarolineConcatto commented:

Hey Momchil,
Thank you for the work. I left some comments.
I did not finish it all. I still need to check the stores. But I will wait for 
the answers in the load, so I can keep checking the store.

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in {
   def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], 
MemEltTyDefault, "aarch64_sve_stnt1">;
 }
 
+let TargetGuard = "sve2p1" in {
+  // Contiguous truncating store from quadword (single vector).
+  def SVST1UWQ  : MInst<"svst1uwq[_{d}]", "vPcd", "iUif",  [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+  def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+
+  def SVST1UDQ  : MInst<"svst1udq[_{d}]", "vPcd", "lUld",  [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+  def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+
+  // Store one vector (vector base + scalar offset)
+  def SVST1Q_SCATTER_U64BASE_OFFSET : 
MInst<"svst1q_scatter[_{2}base]_offset[_{d}]",  "vPgld", "cUcsUsiUilUlfhdb", 
[IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+  def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]",  
"vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+
+  // Store one vector (scalar base + vector offset)
+  def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", 
"vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_vector_offset">;
+
+  // Store N vectors into N-element structure (scalar base)
+  defm SVST2Q : StructStore<"svst2q[_{d}]", "vPc2", "aarch64_sve_st2q">;
+  defm SVST3Q : StructStore<"svst3q[_{d}]", "vPc3", "aarch64_sve_st3q">;
+  defm SVST4Q : StructStore<"svst4q[_{d}]", "vPc4", "aarch64_sve_st4q">;
+
+  // Store N vectors into N-element structure (scalar base, VL displacement)
+  defm SVST2Q_VNUM : StructStore<"svst2q_vnum[_{d}]", "vPcl2", 
"aarch64_sve_st2q">;
+  defm SVST3Q_VNUM : StructStore<"svst3q_vnum[_{d}]", "vPcl3", 
"aarch64_sve_st3q">;
+  defm SVST4Q_VNUM : StructStore<"svst4q_vnum[_{d}]", "vPcl4", 
"aarch64_sve_st4q">;
+
+  // Scatter store quadwords (scalar base + vector index)
+  def SVST1Q_SCATTER_INDICES_U : MInst<"svst1q_scatter_[{3}]index[_{0}]",
"vPpgd", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_index">;
+
+  // Scatter store quadwords (vector base + scalar index)
+  def SVST1Q_SCATTER_INDEX_S   : MInst<"svst1q_scatter[_{2}base]_index[_{0}]", 
"vPgld", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;

CarolineConcatto wrote:

s/svst1q_scatter[_{2}base]_index[_{0}]/svst1q_scatter[_{2}base]_index[_{d}]/

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -1457,6 +1457,24 @@ class AdvSIMD_GatherLoad_VS_Intrinsic
 ],
 [IntrReadMem]>;
 
+class AdvSIMD_GatherLoadQ_VS_Intrinsic
+: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
+[
+  llvm_nxv1i1_ty,
+  llvm_anyvector_ty,

CarolineConcatto wrote:

So, why do we have the predicated vector as llvm_nxv1i1_ty? I was exception 
something like 
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, because I don't see any cast for 
the predicate under EmitSVEGatherLoad. This line 
 Ops[0] = EmitSVEPredicateCast(
  Ops[0], cast(F->getArg(0)->getType()));
would map to whatever is the type in the position 0.

Second, does it works if we replace  the second  llvm_anyvector_ty  by 
llvm_nxv2i64_ty? I do think the vector  will  always be 64 bits

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread via cfe-commits


@@ -420,6 +452,38 @@ let TargetGuard = "sve,bf16" in {
   def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore], 
MemEltTyDefault, "aarch64_sve_stnt1">;
 }
 
+let TargetGuard = "sve2p1" in {
+  // Contiguous truncating store from quadword (single vector).
+  def SVST1UWQ  : MInst<"svst1uwq[_{d}]", "vPcd", "iUif",  [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+  def SVST1UWQ_VNUM : MInst<"svst1uwq_vnum[_{d}]", "vPcld", "iUif", [IsStore], 
MemEltTyInt32, "aarch64_sve_st1uwq">;
+
+  def SVST1UDQ  : MInst<"svst1udq[_{d}]", "vPcd", "lUld",  [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+  def SVST1UDQ_VNUM : MInst<"svst1udq_vnum[_{d}]", "vPcld", "lUld", [IsStore], 
MemEltTyInt64, "aarch64_sve_st1udq">;
+
+  // Store one vector (vector base + scalar offset)
+  def SVST1Q_SCATTER_U64BASE_OFFSET : 
MInst<"svst1q_scatter[_{2}base]_offset[_{d}]",  "vPgld", "cUcsUsiUilUlfhdb", 
[IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+  def SVST1Q_SCATTER_U64BASE : MInst<"svst1q_scatter[_{2}base][_{d}]",  
"vPgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_scalar_offset">;
+
+  // Store one vector (scalar base + vector offset)
+  def SVST1Q_SCATTER_U64OFFSET : MInst<"svst1q_scatter_[{3}]offset[_{0}]", 
"vPpgd", "cUcsUsiUilUlfhdb", [IsScatterStore, IsByteIndexed], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_vector_offset">;
+
+  // Store N vectors into N-element structure (scalar base)
+  defm SVST2Q : StructStore<"svst2q[_{d}]", "vPc2", "aarch64_sve_st2q">;
+  defm SVST3Q : StructStore<"svst3q[_{d}]", "vPc3", "aarch64_sve_st3q">;
+  defm SVST4Q : StructStore<"svst4q[_{d}]", "vPc4", "aarch64_sve_st4q">;
+
+  // Store N vectors into N-element structure (scalar base, VL displacement)
+  defm SVST2Q_VNUM : StructStore<"svst2q_vnum[_{d}]", "vPcl2", 
"aarch64_sve_st2q">;
+  defm SVST3Q_VNUM : StructStore<"svst3q_vnum[_{d}]", "vPcl3", 
"aarch64_sve_st3q">;
+  defm SVST4Q_VNUM : StructStore<"svst4q_vnum[_{d}]", "vPcl4", 
"aarch64_sve_st4q">;
+
+  // Scatter store quadwords (scalar base + vector index)
+  def SVST1Q_SCATTER_INDICES_U : MInst<"svst1q_scatter_[{3}]index[_{0}]",
"vPpgd", "sUsiUilUlbhfd", [IsScatterStore], MemEltTyDefault, 
"aarch64_sve_st1q_scatter_index">;

CarolineConcatto wrote:

s/svst1q_scatter_[{3}]offset[_{0}]/svst1q_scatter_[{3}]offset[_{d}]/
you could also write:
svst1q_scatter_[{3}]offset[_{4}], but I rather write as d, because it does not 
depends on the position of the parameter.


https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement bitwise operations for IntegralAP (PR #71807)

2023-11-09 Thread Timm Baeder via cfe-commits

https://github.com/tbaederr created 
https://github.com/llvm/llvm-project/pull/71807

None

>From 4d13e7b92c5d6bf08554a2e251ba65b8f433fb87 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Timm=20B=C3=A4der?= 
Date: Thu, 9 Nov 2023 14:29:51 +0100
Subject: [PATCH] [clang][Interp] Implement bitwise operations for IntegralAP

---
 clang/lib/AST/Interp/IntegralAP.h | 8 +++-
 clang/test/AST/Interp/intap.cpp   | 9 +
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/clang/lib/AST/Interp/IntegralAP.h 
b/clang/lib/AST/Interp/IntegralAP.h
index 88de1f1392e6813..c8850a4bbb574aa 100644
--- a/clang/lib/AST/Interp/IntegralAP.h
+++ b/clang/lib/AST/Interp/IntegralAP.h
@@ -219,21 +219,19 @@ template  class IntegralAP final {
 
   static bool bitAnd(IntegralAP A, IntegralAP B, unsigned OpBits,
  IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
+*R = IntegralAP(A.V & B.V);
 return false;
   }
 
   static bool bitOr(IntegralAP A, IntegralAP B, unsigned OpBits,
 IntegralAP *R) {
-assert(false);
+*R = IntegralAP(A.V | B.V);
 return false;
   }
 
   static bool bitXor(IntegralAP A, IntegralAP B, unsigned OpBits,
  IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
+*R = IntegralAP(A.V ^ B.V);
 return false;
   }
 
diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp
index 34c8d0565082994..a8893c8cb4eb9b8 100644
--- a/clang/test/AST/Interp/intap.cpp
+++ b/clang/test/AST/Interp/intap.cpp
@@ -157,4 +157,13 @@ namespace Bitfields {
 // expected-warning {{changes value from 100 to 0}}
 }
 
+namespace BitOps {
+  constexpr unsigned __int128 UZero = 0;
+  constexpr unsigned __int128 Max = ~UZero;
+  static_assert(Max == ~0, "");
+  static_assert((Max & 0) == 0, "");
+  static_assert((UZero | 0) == 0, "");
+  static_assert((Max ^ Max) == 0, "");
+}
+
 #endif

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Interp] Implement bitwise operations for IntegralAP (PR #71807)

2023-11-09 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Timm Baeder (tbaederr)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/71807.diff


2 Files Affected:

- (modified) clang/lib/AST/Interp/IntegralAP.h (+3-5) 
- (modified) clang/test/AST/Interp/intap.cpp (+9) 


``diff
diff --git a/clang/lib/AST/Interp/IntegralAP.h 
b/clang/lib/AST/Interp/IntegralAP.h
index 88de1f1392e6813..c8850a4bbb574aa 100644
--- a/clang/lib/AST/Interp/IntegralAP.h
+++ b/clang/lib/AST/Interp/IntegralAP.h
@@ -219,21 +219,19 @@ template  class IntegralAP final {
 
   static bool bitAnd(IntegralAP A, IntegralAP B, unsigned OpBits,
  IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
+*R = IntegralAP(A.V & B.V);
 return false;
   }
 
   static bool bitOr(IntegralAP A, IntegralAP B, unsigned OpBits,
 IntegralAP *R) {
-assert(false);
+*R = IntegralAP(A.V | B.V);
 return false;
   }
 
   static bool bitXor(IntegralAP A, IntegralAP B, unsigned OpBits,
  IntegralAP *R) {
-// FIXME: Implement.
-assert(false);
+*R = IntegralAP(A.V ^ B.V);
 return false;
   }
 
diff --git a/clang/test/AST/Interp/intap.cpp b/clang/test/AST/Interp/intap.cpp
index 34c8d0565082994..a8893c8cb4eb9b8 100644
--- a/clang/test/AST/Interp/intap.cpp
+++ b/clang/test/AST/Interp/intap.cpp
@@ -157,4 +157,13 @@ namespace Bitfields {
 // expected-warning {{changes value from 100 to 0}}
 }
 
+namespace BitOps {
+  constexpr unsigned __int128 UZero = 0;
+  constexpr unsigned __int128 Max = ~UZero;
+  static_assert(Max == ~0, "");
+  static_assert((Max & 0) == 0, "");
+  static_assert((UZero | 0) == 0, "");
+  static_assert((Max ^ Max) == 0, "");
+}
+
 #endif

``




https://github.com/llvm/llvm-project/pull/71807
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits

https://github.com/phoebewang updated 
https://github.com/llvm/llvm-project/pull/71318

>From d9ee6309924e7f248695cbd488afe98273432e84 Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Sun, 5 Nov 2023 21:15:53 +0800
Subject: [PATCH 1/3] [X86][AVX10] Permit AVX512 options/features used together
 with AVX10

This patch relaxes the driver logic to permit combinations between
AVX512 and AVX10 options and makes sure we have a unified behavior
between options and features combination.

Here are rules we are following when handle these combinations:
1. evex512 can only be used for avx512xxx options/features. It will be
   ignored if used without them;
2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them
   together in any case. It will enable a common super set when they are
   used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".

Compiler emits warnings when user using combinations like
"-mavx512f -mavx10.1-256" in case they won't get unexpected result silently.
---
 .../clang/Basic/DiagnosticCommonKinds.td  |  2 +
 clang/lib/Basic/Targets/X86.cpp   | 57 ---
 clang/lib/Driver/ToolChains/Arch/X86.cpp  |  7 ---
 clang/lib/Headers/avx2intrin.h|  4 +-
 clang/lib/Headers/avx512bf16intrin.h  |  3 +-
 clang/lib/Headers/avx512bwintrin.h|  4 +-
 clang/lib/Headers/avx512dqintrin.h|  4 +-
 clang/lib/Headers/avx512fintrin.h |  8 ++-
 clang/lib/Headers/avx512fp16intrin.h  |  6 +-
 clang/lib/Headers/avx512ifmavlintrin.h| 10 +++-
 clang/lib/Headers/avx512pfintrin.h|  5 --
 clang/lib/Headers/avx512vbmivlintrin.h| 11 +++-
 clang/lib/Headers/avx512vlbf16intrin.h| 14 +++--
 clang/lib/Headers/avx512vlbitalgintrin.h  | 10 +++-
 clang/lib/Headers/avx512vlbwintrin.h  | 10 +++-
 clang/lib/Headers/avx512vlcdintrin.h  | 11 +++-
 clang/lib/Headers/avx512vldqintrin.h  | 10 +++-
 clang/lib/Headers/avx512vlfp16intrin.h|  4 +-
 clang/lib/Headers/avx512vlintrin.h| 10 +++-
 clang/lib/Headers/avx512vlvbmi2intrin.h   | 10 +++-
 clang/lib/Headers/avx512vlvnniintrin.h| 10 +++-
 .../lib/Headers/avx512vlvp2intersectintrin.h  | 10 ++--
 clang/lib/Headers/avx512vpopcntdqvlintrin.h   |  8 ++-
 clang/lib/Headers/avxintrin.h |  4 +-
 clang/lib/Headers/emmintrin.h |  4 +-
 clang/lib/Headers/gfniintrin.h| 14 +++--
 clang/lib/Headers/pmmintrin.h |  2 +-
 clang/lib/Headers/smmintrin.h |  2 +-
 clang/lib/Headers/tmmintrin.h |  4 +-
 clang/lib/Headers/xmmintrin.h |  4 +-
 clang/test/CodeGen/X86/avx512-error.c | 13 +
 clang/test/CodeGen/target-avx-abi-diag.c  | 28 -
 clang/test/Driver/x86-target-features.c   |  6 +-
 33 files changed, 214 insertions(+), 95 deletions(-)

diff --git a/clang/include/clang/Basic/DiagnosticCommonKinds.td 
b/clang/include/clang/Basic/DiagnosticCommonKinds.td
index 9f0ccd255a32148..8084a4ce0d1751b 100644
--- a/clang/include/clang/Basic/DiagnosticCommonKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCommonKinds.td
@@ -346,6 +346,8 @@ def err_opt_not_valid_on_target : Error<
   "option '%0' cannot be specified on this target">;
 def err_invalid_feature_combination : Error<
   "invalid feature combination: %0">;
+def warn_invalid_feature_combination : Warning<
+  "invalid feature combination: %0">, 
InGroup>;
 def warn_target_unrecognized_env : Warning<
   "mismatch between architecture and environment in target triple '%0'; did 
you mean '%1'?">,
   InGroup;
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index eec3cd558435e2a..9cfda95f385d627 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap(
 setFeatureEnabled(Features, F, true);
 
   std::vector UpdatedFeaturesVec;
-  bool HasEVEX512 = true;
+  std::vector UpdatedAVX10FeaturesVec;
+  int HasEVEX512 = -1;
   bool HasAVX512F = false;
   bool HasAVX10 = false;
+  bool HasAVX10_512 = false;
+  std::string LastAVX10;
+  std::string LastAVX512;
   for (const auto &Feature : FeaturesVec) {
 // Expand general-regs-only to -x86, -mmx and -sse
 if (Feature == "+general-regs-only") {
@@ -131,35 +135,50 @@ bool X86TargetInfo::initFeatureMap(
   continue;
 }
 
-if (Feature.substr(0, 7) == "+avx10.") {
-  HasAVX10 = true;
-  HasAVX512F = true;
-  if (Feature.substr(Feature.size() - 3, 3) == "512") {
-HasEVEX512 = true;
-  } else if (Feature.substr(7, 2) == "1-") {
-HasEVEX512 = false;
+if (Feature.substr(1, 6) == "avx10.") {
+  if (Feature[0] == '+') {
+HasAVX10 = true;
+if (Feature.substr(Feature.size() - 3, 3) == "512")
+  HasAVX10_512 = true;
+LastAVX10 = Feature;
+  } else if (HasAVX10 && Feature == "-avx

[clang] [llvm] [AArch64] Add quadword gather load/scatter store intrinsics with unscaled vector offset (PR #71290)

2023-11-09 Thread Momchil Velikov via cfe-commits


@@ -9497,8 +9500,11 @@ Value *CodeGenFunction::EmitSVEScatterStore(const 
SVETypeFlags &TypeFlags,
   // mapped to . However, this might be incompatible with the
   // actual type being stored. For example, when storing doubles (i64) the
   // predicated should be  instead. At the IR level the type of
-  // the predicate and the data being stored must match. Cast accordingly.
-  Ops[1] = EmitSVEPredicateCast(Ops[1], OverloadedTy);
+  // the predicate and the data being stored must match. Cast to the type
+  // expected by the intrinsic. The intrinsic itself should be defined in
+  // a way that enforces relations between parameter types.
+  Ops[1] = EmitSVEPredicateCast(
+  Ops[1], cast(F->getArg(1)->getType()));

momchil-velikov wrote:

Certainly when we operate on `Ops` it does not affect `F`.

https://github.com/llvm/llvm-project/pull/71290
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits


@@ -119,9 +119,13 @@ bool X86TargetInfo::initFeatureMap(
 setFeatureEnabled(Features, F, true);
 
   std::vector UpdatedFeaturesVec;
-  bool HasEVEX512 = true;
+  std::vector UpdatedAVX10FeaturesVec;
+  int HasEVEX512 = -1;

phoebewang wrote:

I think it's better to use enum. It's a 3-status flag. std::optional isn't much 
useful here.

https://github.com/llvm/llvm-project/pull/71318
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits


@@ -15,8 +15,12 @@
 #define __AVX2INTRIN_H
 
 /* Define the default attributes for the functions in this file. */
-#define __DEFAULT_FN_ATTRS256 __attribute__((__always_inline__, __nodebug__, 
__target__("avx2"), __min_vector_width__(256)))
-#define __DEFAULT_FN_ATTRS128 __attribute__((__always_inline__, __nodebug__, 
__target__("avx2"), __min_vector_width__(128)))
+#define __DEFAULT_FN_ATTRS256  
\
+  __attribute__((__always_inline__, __nodebug__,   
\
+ __target__("avx2,no-evex512"), __min_vector_width__(256)))

phoebewang wrote:

We have defined parts AVX512 intrinsics with `no-evex512` and some of them will 
call into these AVX2 intrinsics.
Then we are facing a problem that we cannot call them in some cases because we 
didn't specify `no-evex512` for them.

https://github.com/llvm/llvm-project/pull/71318
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits


@@ -50,11 +50,11 @@ typedef __bf16 __m128bh __attribute__((__vector_size__(16), 
__aligned__(16)));
 
 /* Define the default attributes for the functions in this file. */
 #define __DEFAULT_FN_ATTRS 
\
-  __attribute__((__always_inline__, __nodebug__, __target__("sse2"),   
\
- __min_vector_width__(128)))
+  __attribute__((__always_inline__, __nodebug__,   
\
+ __target__("sse2,no-evex512"), __min_vector_width__(128)))
 #define __DEFAULT_FN_ATTRS_MMX 
\
-  __attribute__((__always_inline__, __nodebug__, __target__("mmx,sse2"),   
\
- __min_vector_width__(64)))
+  __attribute__((__always_inline__, __nodebug__,   
\

phoebewang wrote:

The same reason as above.

https://github.com/llvm/llvm-project/pull/71318
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits


@@ -131,35 +135,50 @@ bool X86TargetInfo::initFeatureMap(
   continue;
 }
 
-if (Feature.substr(0, 7) == "+avx10.") {
-  HasAVX10 = true;
-  HasAVX512F = true;
-  if (Feature.substr(Feature.size() - 3, 3) == "512") {
-HasEVEX512 = true;
-  } else if (Feature.substr(7, 2) == "1-") {
-HasEVEX512 = false;
+if (Feature.substr(1, 6) == "avx10.") {
+  if (Feature[0] == '+') {
+HasAVX10 = true;
+if (Feature.substr(Feature.size() - 3, 3) == "512")
+  HasAVX10_512 = true;
+LastAVX10 = Feature;
+  } else if (HasAVX10 && Feature == "-avx10.1-256") {
+HasAVX10 = false;
+HasAVX10_512 = false;
+  } else if (HasAVX10_512 && Feature == "-avx10.1-512") {
+HasAVX10_512 = false;
   }
+  // Postpone AVX10 features handling after AVX512 settled.
+  UpdatedAVX10FeaturesVec.push_back(Feature);
+  continue;
 } else if (!HasAVX512F && Feature.substr(0, 7) == "+avx512") {
   HasAVX512F = true;
+  LastAVX512 = Feature;
 } else if (HasAVX512F && Feature == "-avx512f") {
   HasAVX512F = false;
-} else if (HasAVX10 && Feature == "-avx10.1-256") {
-  HasAVX10 = false;
-  HasAVX512F = false;
-} else if (!HasEVEX512 && Feature == "+evex512") {
+} else if (HasEVEX512 != true && Feature == "+evex512") {

phoebewang wrote:

I think "std::optional" doesn't help here because we need to distinguish the 
uninitialized status and false too.

https://github.com/llvm/llvm-project/pull/71318
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AMDGPU] Add an option to disable unsafe uses of atomic xor (PR #69229)

2023-11-09 Thread Pierre-Andre Saulais via cfe-commits

pasaulais wrote:

@arsenm, could you share this unfinished patch  you were working on? I could 
start from scratch but I don't want to duplicate the work you've already done.

https://github.com/llvm/llvm-project/pull/69229
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [X86][AVX10] Permit AVX512 options/features used together with AVX10 (PR #71318)

2023-11-09 Thread Phoebe Wang via cfe-commits

phoebewang wrote:

> I'm a little bit confused, What's the expected behavior of `+avx10.1-512 
> -avx10.1-256` in codegen aspect? Should we generate only instructions in the 
> difference of sets? Or do we consider `avx10.1-256` as a base of 
> `avx10.1-512` and if it is disabled `avx10.1-512` can't be enabled?

`-avx10.1-256` works like `-avx512f`, that says, they are special as a 
fundamental feature, which will turn off all derivative features for AVX10 and 
AVX512 respectively.
OTOH, derivative features will only turn off the difference set, e.g., 
`+avx10.3-256 -avx10.2-256` equals to `+avx10.1-256`.

https://github.com/llvm/llvm-project/pull/71318
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[llvm] [clang] [AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (PR #71795)

2023-11-09 Thread Matthew Devereau via cfe-commits

https://github.com/MDevereau updated 
https://github.com/llvm/llvm-project/pull/71795

>From 9846bc9efd79e6e3c2662ea42367c102df88799d Mon Sep 17 00:00:00 2001
From: Matt Devereau 
Date: Thu, 9 Nov 2023 10:50:05 +
Subject: [PATCH 1/2] [AArch64][SME2] Add ldr_zt, str_zt builtins and
 intrinsics

Adds the builtins:

void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)
---
 clang/include/clang/Basic/arm_sme.td  |  5 ++
 clang/include/clang/Basic/arm_sve.td  |  9 
 .../acle_sme2_ldr_str_zt.c| 51 +++
 llvm/include/llvm/IR/IntrinsicsAArch64.td | 11 ++--
 .../Target/AArch64/AArch64ISelDAGToDAG.cpp|  7 ++-
 .../Target/AArch64/AArch64ISelLowering.cpp| 21 
 llvm/lib/Target/AArch64/AArch64ISelLowering.h |  2 +
 .../Target/AArch64/AArch64RegisterInfo.cpp|  6 +++
 .../lib/Target/AArch64/AArch64SMEInstrInfo.td |  4 +-
 llvm/lib/Target/AArch64/SMEInstrFormats.td| 23 +++--
 .../CodeGen/AArch64/sme2-intrinsics-zt0.ll| 27 ++
 11 files changed, 153 insertions(+), 13 deletions(-)
 create mode 100644 
clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
 create mode 100644 llvm/test/CodeGen/AArch64/sme2-intrinsics-zt0.ll

diff --git a/clang/include/clang/Basic/arm_sme.td 
b/clang/include/clang/Basic/arm_sme.td
index b5655afdf419ecf..fe3de56ce3298c5 100644
--- a/clang/include/clang/Basic/arm_sme.td
+++ b/clang/include/clang/Basic/arm_sme.td
@@ -298,3 +298,8 @@ multiclass ZAAddSub {
 
 defm SVADD : ZAAddSub<"add">;
 defm SVSUB : ZAAddSub<"sub">;
+
+let TargetGuard = "sme2" in {
+  def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", MergeNone, "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+  def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", MergeNone, "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+}
diff --git a/clang/include/clang/Basic/arm_sve.td 
b/clang/include/clang/Basic/arm_sve.td
index 3d4c2129565903d..f0b3747898d4145 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -1813,6 +1813,15 @@ def SVWHILERW_H_BF16 : SInst<"svwhilerw[_{1}]", "Pcc", 
"b", MergeNone, "aarch64_
 def SVWHILEWR_H_BF16 : SInst<"svwhilewr[_{1}]", "Pcc", "b", MergeNone, 
"aarch64_sve_whilewr_h", [IsOverloadWhileRW]>;
 }
 
+// //
+// // Spill and fill of ZT0
+// //
+
+// let TargetGuard = "sme2" in {
+//   def SVLDR_ZT : Inst<"svldr_zt", "viQ", "", "aarch64_sme_ldr_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+//   def SVSTR_ZT : Inst<"svstr_zt", "vi%", "", "aarch64_sme_str_zt", 
[IsOverloadNone, IsStreamingCompatible, IsSharedZA, IsPreservesZA], 
[ImmCheck<0, ImmCheck0_0>]>;
+// }
+
 

 // SVE2 - Extended table lookup/permute
 let TargetGuard = "sve2" in {
diff --git a/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c 
b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
new file mode 100644
index 000..3d70ded6b469ba1
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme2-intrinsics/acle_sme2_ldr_str_zt.c
@@ -0,0 +1,51 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+
+// REQUIRES: aarch64-registered-target
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S -p 
mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | 
opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu 
-target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x 
c++ %s | opt -S -p mem2reg,instcombine,tailcallelim | FileCheck %s 
-check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme2 -S 
-disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include 
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED) A1
+#else
+#define SVE_ACLE_FUNC(A1,A2) A1##A2
+#endif
+
+// LDR ZT0
+
+// CHECK-LABEL: @test_svldr_zt(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:tail call void @llvm.aarch64.sme.ldr.zt(i32 0, ptr 
[[BASE:%.*]])
+// CHECK-NEXT:ret void
+//
+// CPP-CHECK-LABEL: @_Z13tes

[clang] [CUDA][HIP] Make template implicitly host device (PR #70369)

2023-11-09 Thread Yaxun Liu via cfe-commits

yxsamliu wrote:

ping

This patch passes our internal CI.

https://github.com/llvm/llvm-project/pull/70369
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Enable --gcc-install-dir for RISCV baremetal toolchains (PR #71803)

2023-11-09 Thread Alex Bradbury via cfe-commits

asb wrote:

Tagging @MaskRay for a quick check of this too, if he has time.

https://github.com/llvm/llvm-project/pull/71803
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)

2023-11-09 Thread via cfe-commits

https://github.com/EugeneZelenko requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/71304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)

2023-11-09 Thread via cfe-commits

https://github.com/EugeneZelenko edited 
https://github.com/llvm/llvm-project/pull/71304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang-tools-extra] [clang-tidy] Improve `container-data-pointer` check to use `c_str()` (PR #71304)

2023-11-09 Thread via cfe-commits


@@ -3,13 +3,9 @@
 readability-container-data-pointer
 ==
 
-Finds cases where code could use ``data()`` rather than the address of the
-element at index 0 in a container. This pattern is commonly used to materialize
-a pointer to the backing data of a container. ``std::vector`` and
-``std::string`` provide a ``data()`` accessor to retrieve the data pointer 
which
-should be preferred.
+Finds cases where code references the address of the element at index 0 in a 
container and replaces them with calls to ``data()`` or ``c_str()``.
 
-This also ensures that in the case that the container is empty, the data 
pointer
+Using ``data()`` or ``c_str()`` is more readable and ensures that if the 
container is empty, the data pointer

EugeneZelenko wrote:

Please follow 80 characters limit.

https://github.com/llvm/llvm-project/pull/71304
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[openmp] [llvm] [clang] [OpenMP] Rework handling of global ctor/dtors in OpenMP (PR #71739)

2023-11-09 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/71739

>From c3df637dd2cb9a5210cb90a3bb69a63c31236039 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 7 Nov 2023 17:12:31 -0600
Subject: [PATCH] [OpenMP] Rework handling of global ctor/dtors in OpenMP

Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
---
 clang/lib/CodeGen/CGDeclCXX.cpp   |  13 +-
 clang/lib/CodeGen/CGOpenMPRuntime.cpp | 130 --
 clang/lib/CodeGen/CGOpenMPRuntime.h   |   8 --
 clang/lib/CodeGen/CodeGenFunction.h   |   5 +
 clang/lib/CodeGen/CodeGenModule.h |  14 +-
 clang/lib/CodeGen/ItaniumCXXABI.cpp   |   7 +
 .../amdgcn_openmp_device_math_constexpr.cpp   |  48 +--
 .../amdgcn_target_global_constructor.cpp  |  30 ++--
 clang/test/OpenMP/declare_target_codegen.cpp  |   1 -
 ...x_declare_target_var_ctor_dtor_codegen.cpp |  35 +
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |   4 -
 llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp |   7 +-
 .../plugins-nextgen/amdgpu/src/rtl.cpp|  52 +++
 .../common/PluginInterface/GlobalHandler.h|  10 +-
 .../PluginInterface/PluginInterface.cpp   |   7 +
 .../common/PluginInterface/PluginInterface.h  |  14 ++
 .../plugins-nextgen/cuda/src/rtl.cpp  | 115 
 openmp/libomptarget/src/rtl.cpp   |   6 +
 .../test/libc/global_ctor_dtor.cpp|  37 +
 19 files changed, 327 insertions(+), 216 deletions(-)
 create mode 100644 openmp/libomptarget/test/libc/global_ctor_dtor.cpp

diff --git a/clang/lib/CodeGen/CGDeclCXX.cpp b/clang/lib/CodeGen/CGDeclCXX.cpp
index 3fa28b343663f61..e08a1e5f42df20c 100644
--- a/clang/lib/CodeGen/CGDeclCXX.cpp
+++ b/clang/lib/CodeGen/CGDeclCXX.cpp
@@ -327,6 +327,15 @@ void CodeGenFunction::registerGlobalDtorWithAtExit(const 
VarDecl &VD,
   registerGlobalDtorWithAtExit(dtorStub);
 }
 
+/// Register a global destructor using the LLVM 'llvm.global_dtors' global.
+void CodeGenFunction::registerGlobalDtorWithLLVM(const VarDecl &VD,
+ llvm::FunctionCallee Dtor,
+ llvm::Constant *Addr) {
+  // Create a function which calls the destructor.
+  llvm::Function *dtorStub = createAtExitStub(VD, Dtor, Addr);
+  CGM.AddGlobalDtor(dtorStub);
+}
+
 void CodeGenFunction::registerGlobalDtorWithAtExit(llvm::Constant *dtorStub) {
   // extern "C" int atexit(void (*f)(void));
   assert(dtorStub->getType() ==
@@ -519,10 +528,6 @@ CodeGenModule::EmitCXXGlobalVarDeclInitFunc(const VarDecl 
*D,
D->hasAttr()))
 return;
 
-  if (getLangOpts().OpenMP &&
-  getOpenMPRuntime().emitDeclareTargetVarDefinition(D, Addr, PerformInit))
-return;
-
   // Check if we've already initialized this decl.
   auto I = DelayedCXXInitPosition.find(D);
   if (I != DelayedCXXInitPosition.end() && I->second == ~0U)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index a8e1150e44566b8..d2be8141a3a4b31 100644
--- a/clang/lib/Code

[llvm] [clang] [AIX] Enable tests relating to 64-bit XCOFF object files (PR #71814)

2023-11-09 Thread Jake Egan via cfe-commits

https://github.com/jakeegan created 
https://github.com/llvm/llvm-project/pull/71814

We now have 64-bit XCOFF object file support, so these tests can be enabled 
again. However, some tests still fail due to unsupported debug sections, so I 
cleaned up their comments. 

>From 080887dca39dacdf482298b30137e494c0cbcb8b Mon Sep 17 00:00:00 2001
From: Jake Egan <5326451+jakee...@users.noreply.github.com>
Date: Thu, 9 Nov 2023 10:05:10 -0500
Subject: [PATCH] [AIX] Enable tests relating to 64-bit XCOFF object files

---
 clang/test/lit.cfg.py |  37 -
 llvm/test/lit.cfg.py  |  28 
 .../DebugInfo/DWARF/DWARFDebugInfoTest.cpp|  70 ++--
 .../DebugInfo/DWARF/DWARFDebugLineTest.cpp| 155 ++
 4 files changed, 28 insertions(+), 262 deletions(-)

diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py
index 60843ef8a142048..271372b928ac55c 100644
--- a/clang/test/lit.cfg.py
+++ b/clang/test/lit.cfg.py
@@ -332,43 +332,6 @@ def calculate_arch_features(arch_string):
 config.available_features.add("llvm-driver")
 
 
-def exclude_unsupported_files_for_aix(dirname):
-for filename in os.listdir(dirname):
-source_path = os.path.join(dirname, filename)
-if os.path.isdir(source_path):
-continue
-f = open(source_path, "r", encoding="ISO-8859-1")
-try:
-data = f.read()
-# 64-bit object files are not supported on AIX, so exclude the 
tests.
-if (
-any(
-option in data
-for option in (
-"-emit-obj",
-"-fmodule-format=obj",
-"-fintegrated-as",
-)
-)
-and "64" in config.target_triple
-):
-config.excludes += [filename]
-finally:
-f.close()
-
-
-if "aix" in config.target_triple:
-for directory in (
-"/CodeGenCXX",
-"/Misc",
-"/Modules",
-"/PCH",
-"/Driver",
-"/ASTMerge/anonymous-fields",
-"/ASTMerge/injected-class-name-decl",
-):
-exclude_unsupported_files_for_aix(config.test_source_root + directory)
-
 # Some tests perform deep recursion, which requires a larger pthread stack size
 # than the relatively low default of 192 KiB for 64-bit processes on AIX. The
 # `AIXTHREAD_STK` environment variable provides a non-intrusive way to request
diff --git a/llvm/test/lit.cfg.py b/llvm/test/lit.cfg.py
index 022d1aedbdcdbb6..f3b49a398e76062 100644
--- a/llvm/test/lit.cfg.py
+++ b/llvm/test/lit.cfg.py
@@ -601,34 +601,6 @@ def have_ld64_plugin_support():
 config.available_features.add("use_msan_with_origins")
 
 
-def exclude_unsupported_files_for_aix(dirname):
-for filename in os.listdir(dirname):
-source_path = os.path.join(dirname, filename)
-if os.path.isdir(source_path):
-continue
-f = open(source_path, "r")
-try:
-data = f.read()
-# 64-bit object files are not supported on AIX, so exclude the 
tests.
-if (
-"-emit-obj" in data or "-filetype=obj" in data
-) and "64" in config.target_triple:
-config.excludes += [filename]
-finally:
-f.close()
-
-
-if "aix" in config.target_triple:
-for directory in (
-"/CodeGen/X86",
-"/DebugInfo",
-"/DebugInfo/X86",
-"/DebugInfo/Generic",
-"/LTO/X86",
-"/Linker",
-):
-exclude_unsupported_files_for_aix(config.test_source_root + directory)
-
 # Some tools support an environment variable "OBJECT_MODE" on AIX OS, which
 # controls the kind of objects they will support. If there is no "OBJECT_MODE"
 # environment variable specified, the default behaviour is to support 32-bit
diff --git a/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp 
b/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
index d81557d756300c8..0b7f8f41bc53f43 100644
--- a/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
+++ b/llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
@@ -33,6 +33,12 @@
 #include "gtest/gtest.h"
 #include 
 
+// AIX doesn't support debug_str_offsets or debug_addr sections
+#ifdef _AIX
+#define NO_SUPPORT_DEBUG_STR_OFFSETS
+#define NO_SUPPORT_DEBUG_ADDR
+#endif
+
 using namespace llvm;
 using namespace dwarf;
 using namespace utils;
@@ -435,11 +441,7 @@ TEST(DWARFDebugInfo, TestDWARF32Version2Addr4AllForms) {
   TestAllForms<2, AddrType, RefAddrType>();
 }
 
-#ifdef _AIX
-TEST(DWARFDebugInfo, DISABLED_TestDWARF32Version2Addr8AllForms) {
-#else
 TEST(DWARFDebugInfo, TestDWARF32Version2Addr8AllForms) {
-#endif
   // Test that we can decode all forms for DWARF32, version 2, with 4 byte
   // addresses.
   typedef uint64_t AddrType;
@@ -457,11 +459,7 @@ TEST(DWARFDebugInfo, TestDWARF32Version3Addr4AllForms) {
   TestAllForms<3, AddrType

  1   2   3   4   >