date:20241101

[llvm-branch-commits] [BOLT] Encode landing pads in BAT (PR #114602)

2024-11-01 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)


Changes

Reuse secondary entry points vector and include landing pad offsets.
Use LSB to encode LPENTRY bit, similar to BRANCHENTRY bit used to
distinguish branch and block entries in the address map.

Test Plan: updated bolt-address-translation.test


---
Full diff: https://github.com/llvm/llvm-project/pull/114602.diff


4 Files Affected:

- (modified) bolt/docs/BAT.md (+8-4) 
- (modified) bolt/include/bolt/Profile/BoltAddressTranslation.h (+3) 
- (modified) bolt/lib/Profile/BoltAddressTranslation.cpp (+41-29) 
- (modified) bolt/test/X86/callcont-fallthru.s (+7-1) 


``diff
diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
index 817ad288aa34ba..3b42c36541acd3 100644
--- a/bolt/docs/BAT.md
+++ b/bolt/docs/BAT.md
@@ -54,7 +54,7 @@ Functions table:
 |  table   |
 |  |
 | Secondary entry  |
-|  points  |
+|  points and LPs  |
 |--|
 
 ```
@@ -80,7 +80,7 @@ Hot indices are delta encoded, implicitly starting at zero.
 | `HotIndex` | Delta, ULEB128 | Index of corresponding hot function in hot 
functions table | Cold |
 | `FuncHash` | 8b | Function hash for input function | Hot |
 | `NumBlocks` | ULEB128 | Number of basic blocks in the original function | 
Hot |
-| `NumSecEntryPoints` | ULEB128 | Number of secondary entry points in the 
original function | Hot |
+| `NumSecEntryPoints` | ULEB128 | Number of secondary entry points and landing 
pads in the original function | Hot |
 | `ColdInputSkew` | ULEB128 | Skew to apply to all input offsets | Cold |
 | `NumEntries` | ULEB128 | Number of address translation entries for a 
function | Both |
 | `EqualElems` | ULEB128 | Number of equal offsets in the beginning of a 
function | Both |
@@ -116,7 +116,11 @@ input basic block mapping.
 
 ### Secondary Entry Points table
 The table is emitted for hot fragments only. It contains `NumSecEntryPoints`
-offsets denoting secondary entry points, delta encoded, implicitly starting at 
zero.
+offsets denoting secondary entry points and landing pads, delta encoded,
+implicitly starting at zero.
 | Entry | Encoding | Description |
 | - |  | --- |
-| `SecEntryPoint` | Delta, ULEB128 | Secondary entry point offset |
+| `SecEntryPoint` | Delta, ULEB128 | Secondary entry point offset with 
`LPENTRY` LSB bit |
+
+`LPENTRY` bit denotes whether a given offset is a landing pad block. If not 
set,
+the offset is a secondary entry point.
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h 
b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index 65b9ba874368f3..62367ca3aebdce 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -181,6 +181,9 @@ class BoltAddressTranslation {
   /// translation map entry
   const static uint32_t BRANCHENTRY = 0x1;
 
+  /// Identifies a landing pad in secondary entry point map entry.
+  const static uint32_t LPENTRY = 0x1;
+
 public:
   /// Map basic block input offset to a basic block index and hash pair.
   class BBHashMapTy {
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp 
b/bolt/lib/Profile/BoltAddressTranslation.cpp
index ec7e303c0f52e8..9ce62052653e36 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -86,21 +86,16 @@ void BoltAddressTranslation::write(const BinaryContext &BC, 
raw_ostream &OS) {
 if (Function.isIgnored() || (!BC.HasRelocations && !Function.isSimple()))
   continue;
 
-uint32_t NumSecondaryEntryPoints = 0;
-Function.forEachEntryPoint([&](uint64_t Offset, const MCSymbol *) {
-  if (!Offset)
-return true;
-  ++NumSecondaryEntryPoints;
-  SecondaryEntryPointsMap[OutputAddress].push_back(Offset);
-  return true;
-});
-
 LLVM_DEBUG(dbgs() << "Function name: " << Function.getPrintName() << "\n");
 LLVM_DEBUG(dbgs() << " Address reference: 0x"
   << Twine::utohexstr(Function.getOutputAddress()) << 
"\n");
 LLVM_DEBUG(dbgs() << formatv(" Hash: {0:x}\n", getBFHash(InputAddress)));
-LLVM_DEBUG(dbgs() << " Secondary Entry Points: " << NumSecondaryEntryPoints
-  << '\n');
+LLVM_DEBUG({
+  uint32_t NumSecondaryEntryPoints = 0;
+  if (SecondaryEntryPointsMap.count(InputAddress))
+NumSecondaryEntryPoints = SecondaryEntryPointsMap[InputAddress].size();
+  dbgs() << " Secondary Entry Points: " << NumSecondaryEntryPoints << '\n';
+});
 
 MapTy Map;
 for (const BinaryBasicBlock *const BB :
@@ -207,10 +202,9 @@ void BoltAddressTranslation::writeMaps(std::map &Maps,
   << Twine::utohexstr(Address) << ".\n");
 encodeULEB128(Address - PrevAddress, OS);
 PrevAddress = Address;
-const uint32_t NumSecondaryEntryPoints =
-SecondaryEntryPointsMap.count(Address)
-? SecondaryEntryPointsMap[Address].size()
-: 0;
+u

[llvm-branch-commits] [BOLT] Encode landing pads in BAT (PR #114602)

2024-11-01 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/114602

Reuse secondary entry points vector and include landing pad offsets.
Use LSB to encode LPENTRY bit, similar to BRANCHENTRY bit used to
distinguish branch and block entries in the address map.

Test Plan: updated bolt-address-translation.test



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (PR #114438)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114438

>From fe1979082eea45d70ac6b6112f2eb4c4fdb2fa72 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Thu, 31 Oct 2024 12:49:07 -0400
Subject: [PATCH] [WIP][AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor
 existing attribute

---
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 79 +++
 .../annotate-kernel-features-hsa-call.ll  | 46 +--
 .../AMDGPU/attributor-loop-issue-58639.ll |  3 +-
 .../CodeGen/AMDGPU/direct-indirect-call.ll|  3 +-
 .../CodeGen/AMDGPU/propagate-waves-per-eu.ll  | 59 +++---
 .../AMDGPU/remove-no-kernel-id-attribute.ll   |  9 ++-
 .../AMDGPU/uniform-work-group-multistep.ll|  3 +-
 .../uniform-work-group-recursion-test.ll  |  2 +-
 8 files changed, 111 insertions(+), 93 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 8b111cf15575a6..ba2b6159c4f0a2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -198,6 +198,17 @@ class AMDGPUInformationCache : public InformationCache {
 return ST.getWavesPerEU(F, FlatWorkGroupSize);
   }
 
+  std::optional>
+  getWavesPerEUAttr(const Function &F) {
+auto Val = AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu",
+   /*OnlyFirstRequired=*/true);
+if (Val && Val->second == 0) {
+  const GCNSubtarget &ST = TM.getSubtarget(F);
+  Val->second = ST.getMaxWavesPerEU();
+}
+return Val;
+  }
+
   std::pair
   getEffectiveWavesPerEU(const Function &F,
  std::pair WavesPerEU,
@@ -768,22 +779,6 @@ struct AAAMDSizeRangeAttribute
/*ForceReplace=*/true);
   }
 
-  ChangeStatus emitAttributeIfNotDefault(Attributor &A, unsigned Min,
- unsigned Max) {
-// Don't add the attribute if it's the implied default.
-if (getAssumed().getLower() == Min && getAssumed().getUpper() - 1 == Max)
-  return ChangeStatus::UNCHANGED;
-
-Function *F = getAssociatedFunction();
-LLVMContext &Ctx = F->getContext();
-SmallString<10> Buffer;
-raw_svector_ostream OS(Buffer);
-OS << getAssumed().getLower() << ',' << getAssumed().getUpper() - 1;
-return A.manifestAttrs(getIRPosition(),
-   {Attribute::get(Ctx, AttrName, OS.str())},
-   /*ForceReplace=*/true);
-  }
-
   const std::string getAsStr(Attributor *) const override {
 std::string Str;
 raw_string_ostream OS(Str);
@@ -879,29 +874,47 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
   AAAMDWavesPerEU(const IRPosition &IRP, Attributor &A)
   : AAAMDSizeRangeAttribute(IRP, A, "amdgpu-waves-per-eu") {}
 
-  bool isValidState() const override {
-return !Assumed.isEmptySet() && IntegerRangeState::isValidState();
-  }
-
   void initialize(Attributor &A) override {
 Function *F = getAssociatedFunction();
 auto &InfoCache = static_cast(A.getInfoCache());
 
-if (const auto *AssumedGroupSize = A.getAAFor(
-*this, IRPosition::function(*F), DepClassTy::REQUIRED);
-AssumedGroupSize->isValidState()) {
+auto TakeRange = [&](std::pair R) {
+  auto [Min, Max] = R;
+  ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
+  IntegerRangeState RangeState(Range);
+  clampStateAndIndicateChange(this->getState(), RangeState);
+  indicateOptimisticFixpoint();
+};
 
-  unsigned Min, Max;
-  std::tie(Min, Max) = InfoCache.getWavesPerEU(
-  *F, {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-   AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
+std::pair MaxWavesPerEURange{
+1U, InfoCache.getMaxWavesPerEU(*F)};
 
-  ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
-  intersectKnown(Range);
+// If the attribute exists, we will honor it if it is not the default.
+if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) {
+  if (*Attr != MaxWavesPerEURange) {
+TakeRange(*Attr);
+return;
+  }
 }
 
-if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
-  indicatePessimisticFixpoint();
+// Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the
+// calculation of waves per EU involves flat work group size, we can't
+// simply use an assumed flat work group size as a start point, because the
+// update of flat work group size is in an inverse direction of waves per
+// EU. However, we can still do something if it is an entry function. Since
+// an entry function is a terminal node, and flat work group size either
+// from attribute or default will be used anyway, we can take that value 
and
+// calculate the waves per EU based on it. This result can't be updated by
+// no means, but that could still allow us

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114577

>From 488643ca48229d9c48d9b28916fd887b8be15205 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 1 Nov 2024 12:39:52 -0400
Subject: [PATCH] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline

---
 clang/lib/CodeGen/BackendUtil.cpp | 22 +
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++-
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 24 +++
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 16 -
 .../CodeGen/AMDGPU/print-pipeline-passes.ll   |  1 +
 llvm/tools/opt/NewPMDriver.cpp|  4 ++--
 6 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index 47a30f00612eb7..70035a5e069a90 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple,
   };
   if (ClSanitizeOnOptimizerEarlyEP) {
 PB.registerOptimizerEarlyEPCallback(
-[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
+[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase Phase) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, Phase);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 565fd2ab2147e5..e7bc3a58f414f1 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -490,7 +490,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations before the function
   /// optimization pipeline.
   void registerOptimizerEarlyEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerEarlyEPCallbacks.push_back(C);
   }
 
@@ -499,7 +500,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerLastEPCallbacks.push_back(C);
   }
 
@@ -630,9 +632,11 @@ class PassBuilder {
   void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM,
 OptimizationLevel Level);
   void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
-   OptimizationLevel Level);
+   OptimizationLevel Level,
+   ThinOrFullLTOPhase Phase);
   void invokeOpt

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi commented:

I think this is basically good from my perspective, but I'd like one of the 
longtime LLD maintainers, and maybe someone more experienced w/ PAC to chime in 
before landing. Maybe @smithp35, @MaskRay, or @kbeyls have some thoughts?

https://github.com/llvm/llvm-project/pull/113817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi edited 
https://github.com/llvm/llvm-project/pull/113817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Paul Kirth via llvm-branch-commits



@@ -0,0 +1,134 @@
+// REQUIRES: aarch64
+// RUN: rm -rf %t && split-file %s %t && cd %t
+
+//--- a.s
+
+.section .tbss,"awT",@nobits
+.global a
+a:
+.xword 0
+
+//--- ok.s
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64-pc-linux -mattr=+pauth ok.s -o 
ok.o
+// RUN: ld.lld -shared ok.o -o ok.so
+// RUN: llvm-objdump --no-print-imm-hex -d --no-show-raw-insn ok.so | \
+// RUN:   FileCheck -DP=20 -DA=896 -DB=912 -DC=928 %s
+// RUN: llvm-readobj -r -x .got ok.so | FileCheck --check-prefix=REL \
+// RUN:   -DP1=20 -DA1=380 -DB1=390 -DC1=3A0 -DP2=020 -DA2=380 -DB2=390 
-DC2=3a0 %s
+
+// RUN: llvm-mc -filetype=obj -triple=aarch64-pc-linux -mattr=+pauth a.s -o 
a.so.o
+// RUN: ld.lld -shared a.so.o -soname=so -o a.so
+// RUN: ld.lld ok.o a.so -o ok
+// RUN: llvm-objdump --no-print-imm-hex -d --no-show-raw-insn ok | \
+// RUN:   FileCheck -DP=220 -DA=936 -DB=952 -DC=968 %s
+// RUN: llvm-readobj -r -x .got ok | FileCheck --check-prefix=REL \
+// RUN:   -DP1=220 -DA1=3A8 -DB1=3B8 -DC1=3C8 -DP2=220 -DA2=3a8 -DB2=3b8 
-DC2=3c8 %s
+
+.text
+adrpx0, :tlsdesc_auth:a
+ldr x16, [x0, :tlsdesc_auth_lo12:a]
+add x0, x0, :tlsdesc_auth_lo12:a
+.tlsdesccall a
+blraa   x16, x0
+
+// CHECK:  adrpx0, 0x[[P]]000
+// CHECK-NEXT: ldr x16, [x0, #[[A]]]
+// CHECK-NEXT: add x0, x0, #[[A]]
+// CHECK-NEXT: blraa   x16, x0
+
+// Create relocation against local TLS symbols where linker should
+// create target specific dynamic TLSDESC relocation where addend is
+// the symbol VMA in tls block.
+
+adrpx0, :tlsdesc_auth:local1
+ldr x16, [x0, :tlsdesc_auth_lo12:local1]
+add x0, x0, :tlsdesc_auth_lo12:local1
+.tlsdesccall local1
+blraa   x16, x0
+
+// CHECK:  adrpx0, 0x[[P]]000
+// CHECK-NEXT: ldr x16, [x0, #[[B]]]
+// CHECK-NEXT: add x0, x0, #[[B]]
+// CHECK-NEXT: blraa   x16, x0
+
+adrpx0, :tlsdesc_auth:local2
+ldr x16, [x0, :tlsdesc_auth_lo12:local2]
+add x0, x0, :tlsdesc_auth_lo12:local2
+.tlsdesccall local2
+blraa   x16, x0
+
+// CHECK:  adrpx0, 0x[[P]]000
+// CHECK-NEXT: ldr x16, [x0, #[[C]]]
+// CHECK-NEXT: add x0, x0, #[[C]]
+// CHECK-NEXT: blraa   x16, x0
+
+.section .tbss,"awT",@nobits
+.type   local1,@object
+.p2align 2
+local1:
+.word   0
+.size   local1, 4
+
+.type   local2,@object
+.p2align 3
+local2:
+.xword  0
+.size   local2, 8
+
+
+// R_AARCH64_AUTH_TLSDESC - 0x0 -> start of tls block
+// R_AARCH64_AUTH_TLSDESC - 0x8 -> align (sizeof (local1), 8)
+
+// REL:  Relocations [
+// REL-NEXT:   Section (5) .rela.dyn {
+// REL-NEXT: 0x[[P1]][[B1]] R_AARCH64_AUTH_TLSDESC - 0x0
+// REL-NEXT: 0x[[P1]][[C1]] R_AARCH64_AUTH_TLSDESC - 0x8
+// REL-NEXT: 0x[[P1]][[A1]] R_AARCH64_AUTH_TLSDESC a 0x0
+// REL-NEXT:   }
+// REL-NEXT: ]
+
+// REL:  Hex dump of section '.got':
+// REL-NEXT: 0x00[[P2]][[A2]]  0080  00a0
+// REL-NEXT: 0x00[[P2]][[B2]]  0080  00a0
+// REL-NEXT: 0x00[[P2]][[C2]]  0080  00a0
+//   ^^
+//   0b1000 bit 63 address 
diversity = true, bits 61..60 key = IA
+// ^^
+// 0b1010 bit 
63 address diversity = true, bits 61..60 key = DA

ilovepi wrote:

Should these be checked?

https://github.com/llvm/llvm-project/pull/113817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Paul Kirth via llvm-branch-commits



@@ -1355,6 +1355,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr 
expr, RelType type,
 return 1;
   }
 
+  auto fatalBothAuthAndNonAuth = [&sym]() {
+fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() +
+  "' requested, but only one type of TLSDESC entry per symbol is "
+  "supported");
+  };
+
+  // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE.
+  // 
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions
+  // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC).
+  if (oneof(
+  expr)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+if (!sym.hasFlag(NEEDS_TLSDESC))
+  sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH);
+else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH))
+  fatalBothAuthAndNonAuth();
+sec->addReloc({expr, type, offset, addend, &sym});
+return 1;
+  }
+
+  if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+// TLSDESC_CALL hint relocation probably should not be emitted by compiler
+// with signed TLSDESC enabled since it does not give any value, but leave 
a
+// check against that just in case someone uses it.
+if (expr != R_TLSDESC_CALL)
+  fatalBothAuthAndNonAuth();

ilovepi wrote:

Thanks for the clarification. I was thinking something could reach here w/ 
`NEEDS_TLSDESC_AUTH` set that isn't a `TLSDESC_CALL` reloc, but wouldn't be 
both Auth and non-Auth (e.g. just plain invalid, rather than this flavor of 
invalid).

https://github.com/llvm/llvm-project/pull/113817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Paul Kirth via llvm-branch-commits



@@ -78,6 +78,79 @@ _start:
   adrp x1, :got_auth:zed
   add  x1, x1, :got_auth_lo12:zed
 
+#--- ok-tiny.s
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux ok-tiny.s -o ok-tiny.o
+
+# RUN: ld.lld ok-tiny.o a.so -pie -o external-tiny
+# RUN: llvm-readelf -r -S -x .got external-tiny | FileCheck %s 
--check-prefix=EXTERNAL-TINY
+
+# RUN: ld.lld ok-tiny.o a.o -pie -o local-tiny
+# RUN: llvm-readelf -r -S -x .got -s local-tiny | FileCheck %s 
--check-prefix=LOCAL-TINY
+
+# EXTERNAL-TINY:  OffsetInfo Type  
  Symbol's Value   Symbol's Name + Addend
+# EXTERNAL-TINY-NEXT: 00020380  0001e201 
R_AARCH64_AUTH_GLOB_DAT  bar + 0
+# EXTERNAL-TINY-NEXT: 00020388  0002e201 
R_AARCH64_AUTH_GLOB_DAT  zed + 0
+
+## Symbol's values for bar and zed are equal since they contain no content 
(see Inputs/shared.s)
+# LOCAL-TINY: OffsetInfo Type  
  Symbol's Value   Symbol's Name + Addend
+# LOCAL-TINY-NEXT:00020320  0411 
R_AARCH64_AUTH_RELATIVE 10260
+# LOCAL-TINY-NEXT:00020328  0411 
R_AARCH64_AUTH_RELATIVE 10260
+
+# EXTERNAL-TINY:  Hex dump of section '.got':
+# EXTERNAL-TINY-NEXT: 0x00020380  0080  00a0
+#   ^^
+#   0b1000 bit 63 address 
diversity = true, bits 61..60 key = IA
+# ^^
+# 0b1010 
bit 63 address diversity = true, bits 61..60 key = DA

ilovepi wrote:

A thanks for the explanation. I think I made a similar comment in one of the 
other PRs before I saw this. Feel free to ignore that/mark it resolved.

https://github.com/llvm/llvm-project/pull/113816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi commented:

Again, LGTM, but lets get another maintainer to take a look before landing.

https://github.com/llvm/llvm-project/pull/113816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Paul Kirth via llvm-branch-commits


ilovepi wrote:

> Again, LGTM, but lets get another maintainer to take a look before landing.

well, assuming presubmit is working. I see a number of test failures, ATM.

https://github.com/llvm/llvm-project/pull/113816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits



@@ -821,8 +825,15 @@ void 
AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
   PM.addPass(AMDGPUSwLowerLDSPass(*this));
 if (EnableLowerModuleLDS)
   PM.addPass(AMDGPULowerModuleLDSPass(*this));
-if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0)
-  PM.addPass(AMDGPUAttributorPass(*this));
+if (Level != OptimizationLevel::O0) {
+  if (EnableAMDGPUAttributor)
+PM.addPass(AMDGPUAttributorPass(*this));
+  // Do we really need internalization in LTO?
+  if (InternalizeSymbols) {

shiltian wrote:

This needs to be moved before attributor.

https://github.com/llvm/llvm-project/pull/114547
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-01 Thread Pengcheng Wang via llvm-branch-commits


https://github.com/wangpc-pp edited 
https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114547

>From 912283a403e1a3a95ebead98467cc743024b5455 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 1 Nov 2024 10:51:20 -0400
Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call
 backs

The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link
stage. There are some passes that we want them in non-LTO mode, but not at LTO
pre-link stage. The control is missing currently. This PR adds the support. To
demonstrate the use, we only enable the internalization pass in non-LTO mode for
AMDGPU because having it run in pre-link stage causes some issues.
---
 clang/lib/CodeGen/BackendUtil.cpp |  3 ++-
 llvm/include/llvm/Passes/PassBuilder.h| 10 +++---
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  8 
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++
 llvm/lib/Target/BPF/BPFTargetMachine.cpp  |  2 +-
 .../CodeGen/AMDGPU/print-pipeline-passes.ll   |  8 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index ae33554a66b6b5..47a30f00612eb7 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   createModuleToFunctionPassAdaptor(ObjCARCExpandPass()));
   });
   PB.registerPipelineEarlySimplificationEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
+  [](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase) {
 if (Level != OptimizationLevel::O0)
   MPM.addPass(ObjCARCAPElimPass());
   });
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 0ebfdbb7865fdd..565fd2ab2147e5 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -480,7 +480,8 @@ class PassBuilder {
   /// This extension point allows adding optimization right after passes that 
do
   /// basic simplification of the input IR.
   void registerPipelineEarlySimplificationEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 PipelineEarlySimplificationEPCallbacks.push_back(C);
   }
 
@@ -639,7 +640,8 @@ class PassBuilder {
   void invokePipelineStartEPCallbacks(ModulePassManager &MPM,
   OptimizationLevel Level);
   void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM,
-OptimizationLevel Level);
+OptimizationLevel Level,
+ThinOrFullLTOPhase Phase);
 
   static bool checkParametrizedPassName(StringRef Name, StringRef PassName) {
 if (!Name.consume_front(PassName))
@@ -764,7 +766,9 @@ class PassBuilder {
   FullLinkTimeOptimizationLastEPCallbacks;
   SmallVector, 2>
   PipelineStartEPCallbacks;
-  SmallVector, 2>
+  SmallVector,
+  2>
   PipelineEarlySimplificationEPCallbacks;
 
   SmallVector, 2>
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 466fbcd7bb7703..9c90accd9c376b 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -384,9 +384,9 @@ void 
PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM,
 C(MPM, Level);
 }
 void PassBuilder::invokePipelineEarlySimplificationEPCallbacks(
-ModulePassManager &MPM, OptimizationLevel Level) {
+ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) 
{
   for (auto &C : PipelineEarlySimplificationEPCallbacks)
-C(MPM, Level);
+C(MPM, Level, Phase);
 }
 
 // Helper to add AnnotationRemarksPass.
@@ -1140,7 +1140,7 @@ 
PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Interprocedural constant propagation now that basic cleanup has occurred
   // and prior to optimizing globals.
@@ -2153,7 +2153,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel 
Level,
   if (PGOOpt && PGOOpt->DebugInfoForProfiling)
 MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Build a minimal pipeline based on the semantics required by LLVM,
   // which is just that always inlining occurs. Further, disable generating
diff --

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)

2024-11-01 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Shilei Tian (shiltian)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/114577.diff


6 Files Affected:

- (modified) clang/lib/CodeGen/BackendUtil.cpp (+12-10) 
- (modified) llvm/include/llvm/Passes/PassBuilder.h (+14-6) 
- (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-10) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+10-6) 
- (modified) llvm/test/CodeGen/AMDGPU/print-pipeline-passes.ll (+1) 
- (modified) llvm/tools/opt/NewPMDriver.cpp (+2-2) 


``diff
diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index 47a30f00612eb7..70035a5e069a90 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple,
   };
   if (ClSanitizeOnOptimizerEarlyEP) {
 PB.registerOptimizerEarlyEPCallback(
-[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
+[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase Phase) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, Phase);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 565fd2ab2147e5..e7bc3a58f414f1 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -490,7 +490,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations before the function
   /// optimization pipeline.
   void registerOptimizerEarlyEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerEarlyEPCallbacks.push_back(C);
   }
 
@@ -499,7 +500,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerLastEPCallbacks.push_back(C);
   }
 
@@ -630,9 +632,11 @@ class PassBuilder {
   void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM,
 OptimizationLevel Level);
   void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
-   OptimizationLevel Level);
+   OptimizationLevel Level,
+   ThinOrFullLTOPhase Phase);
   void invokeOptimizerLastEPCallbacks(ModulePassManager &MPM,
-  OptimizationLevel Level);
+  Optimizatio

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian created 
https://github.com/llvm/llvm-project/pull/114577

None

>From dc94afc308989a4dbaee911f93f1cc1855bd7c55 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 1 Nov 2024 12:39:52 -0400
Subject: [PATCH] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline

---
 clang/lib/CodeGen/BackendUtil.cpp | 22 +
 llvm/include/llvm/Passes/PassBuilder.h| 20 +++-
 llvm/lib/Passes/PassBuilderPipelines.cpp  | 24 +++
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 16 -
 .../CodeGen/AMDGPU/print-pipeline-passes.ll   |  1 +
 llvm/tools/opt/NewPMDriver.cpp|  4 ++--
 6 files changed, 53 insertions(+), 34 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index 47a30f00612eb7..70035a5e069a90 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-  [&](ModulePassManager &MPM, OptimizationLevel Level) {
+  [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) 
{
 if (Level == OptimizationLevel::O0 &&
 LangOpts.Sanitize.has(SanitizerKind::KCFI))
   MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const 
LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
   const CodeGenOptions &CodeGenOpts,
   const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel 
Level,
+ThinOrFullLTOPhase) {
 if (CodeGenOpts.hasSanitizeCoverage()) {
   auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
   MPM.addPass(SanitizerCoveragePass(
@@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple,
   };
   if (ClSanitizeOnOptimizerEarlyEP) {
 PB.registerOptimizerEarlyEPCallback(
-[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
+[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase Phase) {
   ModulePassManager NewMPM;
-  SanitizersCallback(NewMPM, Level);
+  SanitizersCallback(NewMPM, Level, Phase);
   if (!NewMPM.isEmpty()) {
 // Sanitizers can abandon.
 NewMPM.addPass(RequireAnalysisPass());
@@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
 // TODO: Consider passing the MemoryProfileOutput to the pass builder via
 // the PGOOptions, and set this up there.
 if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-  PB.registerOptimizerLastEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
-MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-MPM.addPass(ModuleMemProfilerPass());
-  });
+  PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+OptimizationLevel Level,
+ThinOrFullLTOPhase) {
+MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+MPM.addPass(ModuleMemProfilerPass());
+  });
 }
 
 if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 565fd2ab2147e5..e7bc3a58f414f1 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -490,7 +490,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations before the function
   /// optimization pipeline.
   void registerOptimizerEarlyEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerEarlyEPCallbacks.push_back(C);
   }
 
@@ -499,7 +500,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 OptimizerLastEPCallbacks.push_back(C);
   }
 
@@ -630,9 +632,11 @@ class PassBuilder {
   void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM,
 OptimizationLevel Level);
   void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
-   OptimizationLevel Level);
+   OptimizationLevel Level,
+   ThinOrFullLTOPhase Phase);
   void inv

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


shiltian wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/114577?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#114577** https://app.graphite.dev/github/pr/llvm/llvm-project/114577?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#114547** https://app.graphite.dev/github/pr/llvm/llvm-project/114547?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#114564** https://app.graphite.dev/github/pr/llvm/llvm-project/114564?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/114577
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian edited 
https://github.com/llvm/llvm-project/pull/114547
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114547

>From e753c4fadf85f1730a458804bec41d32df5a692b Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 1 Nov 2024 10:51:20 -0400
Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call
 backs

The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link
stage. There are some passes that we want them in non-LTO mode, but not at LTO
pre-link stage. The control is missing currently. This PR adds the support. To
demonstrate the use, we only enable the internalization pass in non-LTO mode for
AMDGPU because having it run in pre-link stage causes some issues.
---
 clang/lib/CodeGen/BackendUtil.cpp |  3 ++-
 llvm/include/llvm/Passes/PassBuilder.h| 10 +++---
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  8 
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++
 llvm/lib/Target/BPF/BPFTargetMachine.cpp  |  2 +-
 .../CodeGen/AMDGPU/print-pipeline-passes.ll   |  8 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index ae33554a66b6b5..47a30f00612eb7 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   createModuleToFunctionPassAdaptor(ObjCARCExpandPass()));
   });
   PB.registerPipelineEarlySimplificationEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
+  [](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase) {
 if (Level != OptimizationLevel::O0)
   MPM.addPass(ObjCARCAPElimPass());
   });
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 0ebfdbb7865fdd..565fd2ab2147e5 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -480,7 +480,8 @@ class PassBuilder {
   /// This extension point allows adding optimization right after passes that 
do
   /// basic simplification of the input IR.
   void registerPipelineEarlySimplificationEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 PipelineEarlySimplificationEPCallbacks.push_back(C);
   }
 
@@ -639,7 +640,8 @@ class PassBuilder {
   void invokePipelineStartEPCallbacks(ModulePassManager &MPM,
   OptimizationLevel Level);
   void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM,
-OptimizationLevel Level);
+OptimizationLevel Level,
+ThinOrFullLTOPhase Phase);
 
   static bool checkParametrizedPassName(StringRef Name, StringRef PassName) {
 if (!Name.consume_front(PassName))
@@ -764,7 +766,9 @@ class PassBuilder {
   FullLinkTimeOptimizationLastEPCallbacks;
   SmallVector, 2>
   PipelineStartEPCallbacks;
-  SmallVector, 2>
+  SmallVector,
+  2>
   PipelineEarlySimplificationEPCallbacks;
 
   SmallVector, 2>
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 7c512ab15a6d38..bfb9678678f18a 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -384,9 +384,9 @@ void 
PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM,
 C(MPM, Level);
 }
 void PassBuilder::invokePipelineEarlySimplificationEPCallbacks(
-ModulePassManager &MPM, OptimizationLevel Level) {
+ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) 
{
   for (auto &C : PipelineEarlySimplificationEPCallbacks)
-C(MPM, Level);
+C(MPM, Level, Phase);
 }
 
 // Helper to add AnnotationRemarksPass.
@@ -1140,7 +1140,7 @@ 
PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Interprocedural constant propagation now that basic cleanup has occurred
   // and prior to optimizing globals.
@@ -2155,7 +2155,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel 
Level,
   if (PGOOpt && PGOOpt->DebugInfoForProfiling)
 MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Build a minimal pipeline based on the semantics required by LLVM,
   // which is just that always inlining occurs. Further, disable generating
diff --

[llvm-branch-commits] [flang] [flang][cuda] Data transfer with descriptor (PR #114302)

2024-11-01 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits


https://github.com/clementval closed 
https://github.com/llvm/llvm-project/pull/114302
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] 704c0b8 - Revert "[flang][runtime][NFC] Allow different memmove function in assign (#11…"

2024-11-01 Thread via llvm-branch-commits


Author: Valentin Clement (バレンタイン クレメン)
Date: 2024-11-01T10:39:56-07:00
New Revision: 704c0b8e429443150ef4b58fc654ef6087f90e03

URL: 
https://github.com/llvm/llvm-project/commit/704c0b8e429443150ef4b58fc654ef6087f90e03
DIFF: 
https://github.com/llvm/llvm-project/commit/704c0b8e429443150ef4b58fc654ef6087f90e03.diff

LOG: Revert "[flang][runtime][NFC] Allow different memmove function in assign 
(#11…"

This reverts commit b278fe3297557c8db492e2d90b4ea9fe683fa479.

Added: 


Modified: 
flang/include/flang/Runtime/assign.h
flang/runtime/assign.cpp

Removed: 




diff  --git a/flang/include/flang/Runtime/assign.h 
b/flang/include/flang/Runtime/assign.h
index 331ec0516dd2d5..a1cc9eaf4355f6 100644
--- a/flang/include/flang/Runtime/assign.h
+++ b/flang/include/flang/Runtime/assign.h
@@ -24,35 +24,11 @@
 #define FORTRAN_RUNTIME_ASSIGN_H_
 
 #include "flang/Runtime/entry-names.h"
-#include "flang/Runtime/freestanding-tools.h"
 
 namespace Fortran::runtime {
 class Descriptor;
-class Terminator;
-
-enum AssignFlags {
-  NoAssignFlags = 0,
-  MaybeReallocate = 1 << 0,
-  NeedFinalization = 1 << 1,
-  CanBeDefinedAssignment = 1 << 2,
-  ComponentCanBeDefinedAssignment = 1 << 3,
-  ExplicitLengthCharacterLHS = 1 << 4,
-  PolymorphicLHS = 1 << 5,
-  DeallocateLHS = 1 << 6
-};
-
-using MemmoveFct = void *(*)(void *, const void *, std::size_t);
-
-static RT_API_ATTRS void *MemmoveWrapper(
-void *dest, const void *src, std::size_t count) {
-  return Fortran::runtime::memmove(dest, src, count);
-}
-
-RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from,
-Terminator &terminator, int flags, MemmoveFct memmoveFct = 
&MemmoveWrapper);
 
 extern "C" {
-
 // API for lowering assignment
 void RTDECL(Assign)(Descriptor &to, const Descriptor &from,
 const char *sourceFile = nullptr, int sourceLine = 0);

diff  --git a/flang/runtime/assign.cpp b/flang/runtime/assign.cpp
index 8f31fc4d127168..d558ada51cd21a 100644
--- a/flang/runtime/assign.cpp
+++ b/flang/runtime/assign.cpp
@@ -17,6 +17,17 @@
 
 namespace Fortran::runtime {
 
+enum AssignFlags {
+  NoAssignFlags = 0,
+  MaybeReallocate = 1 << 0,
+  NeedFinalization = 1 << 1,
+  CanBeDefinedAssignment = 1 << 2,
+  ComponentCanBeDefinedAssignment = 1 << 3,
+  ExplicitLengthCharacterLHS = 1 << 4,
+  PolymorphicLHS = 1 << 5,
+  DeallocateLHS = 1 << 6
+};
+
 // Predicate: is the left-hand side of an assignment an allocated allocatable
 // that must be deallocated?
 static inline RT_API_ATTRS bool MustDeallocateLHS(
@@ -239,8 +250,8 @@ static RT_API_ATTRS void 
BlankPadCharacterAssignment(Descriptor &to,
 // of elements, but their shape need not to conform (the assignment is done in
 // element sequence order). This facilitates some internal usages, like when
 // dealing with array constructors.
-RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from,
-Terminator &terminator, int flags, MemmoveFct memmoveFct) {
+RT_API_ATTRS static void Assign(
+Descriptor &to, const Descriptor &from, Terminator &terminator, int flags) 
{
   bool mustDeallocateLHS{(flags & DeallocateLHS) ||
   MustDeallocateLHS(to, from, terminator, flags)};
   DescriptorAddendum *toAddendum{to.Addendum()};
@@ -412,14 +423,14 @@ RT_API_ATTRS void Assign(Descriptor &to, const Descriptor 
&from,
 Assign(toCompDesc, fromCompDesc, terminator, nestedFlags);
   } else { // Component has intrinsic type; simply copy raw bytes
 std::size_t componentByteSize{comp.SizeInBytes(to)};
-memmoveFct(to.Element(toAt) + comp.offset(),
+Fortran::runtime::memmove(to.Element(toAt) + comp.offset(),
 from.Element(fromAt) + comp.offset(),
 componentByteSize);
   }
   break;
 case typeInfo::Component::Genre::Pointer: {
   std::size_t componentByteSize{comp.SizeInBytes(to)};
-  memmoveFct(to.Element(toAt) + comp.offset(),
+  Fortran::runtime::memmove(to.Element(toAt) + comp.offset(),
   from.Element(fromAt) + comp.offset(),
   componentByteSize);
 } break;
@@ -465,14 +476,14 @@ RT_API_ATTRS void Assign(Descriptor &to, const Descriptor 
&from,
 const auto &procPtr{
 *procPtrDesc.ZeroBasedIndexedElement(
 k)};
-memmoveFct(to.Element(toAt) + procPtr.offset,
+Fortran::runtime::memmove(to.Element(toAt) + procPtr.offset,
 from.Element(fromAt) + procPtr.offset,
 sizeof(typeInfo::ProcedurePointer));
   }
 }
   } else { // intrinsic type, intrinsic assignment
 if (isSimpleMemmove()) {
-  memmoveFct(to.raw().base_addr, from.raw().base_addr,
+  Fortran::runtime::memmove(to.raw().base_addr, from.raw().base_addr,
   toElements * toElementBytes);
 } else if (toElementBytes > fromElementBytes) { // blank padding
   switch (to.type().raw()) {
@@ -496,8 +

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114547

>From 8ae74a4c6a96eb0c44668d571aa61116eaa48cbe Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 1 Nov 2024 10:51:20 -0400
Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call
 backs

The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link
stage. There are some passes that we want them in non-LTO mode, but not at LTO
pre-link stage. The control is missing currently. This PR adds the support. To
demonstrate the use, we only enable the internalization pass in non-LTO mode for
AMDGPU because having it run in pre-link stage causes some issues.
---
 clang/lib/CodeGen/BackendUtil.cpp |  3 ++-
 llvm/include/llvm/Passes/PassBuilder.h| 12 
 llvm/lib/Passes/PassBuilderPipelines.cpp  |  8 
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++
 llvm/lib/Target/BPF/BPFTargetMachine.cpp  |  2 +-
 .../CodeGen/AMDGPU/print-pipeline-passes.ll   |  8 
 llvm/tools/opt/NewPMDriver.cpp|  2 +-
 7 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/clang/lib/CodeGen/BackendUtil.cpp 
b/clang/lib/CodeGen/BackendUtil.cpp
index ae33554a66b6b5..47a30f00612eb7 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
   createModuleToFunctionPassAdaptor(ObjCARCExpandPass()));
   });
   PB.registerPipelineEarlySimplificationEPCallback(
-  [](ModulePassManager &MPM, OptimizationLevel Level) {
+  [](ModulePassManager &MPM, OptimizationLevel Level,
+ ThinOrFullLTOPhase) {
 if (Level != OptimizationLevel::O0)
   MPM.addPass(ObjCARCAPElimPass());
   });
diff --git a/llvm/include/llvm/Passes/PassBuilder.h 
b/llvm/include/llvm/Passes/PassBuilder.h
index 0ebfdbb7865fdd..268df03615db23 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -480,7 +480,8 @@ class PassBuilder {
   /// This extension point allows adding optimization right after passes that 
do
   /// basic simplification of the input IR.
   void registerPipelineEarlySimplificationEPCallback(
-  const std::function &C) {
+  const std::function &C) {
 PipelineEarlySimplificationEPCallbacks.push_back(C);
   }
 
@@ -638,8 +639,9 @@ class PassBuilder {
  OptimizationLevel Level);
   void invokePipelineStartEPCallbacks(ModulePassManager &MPM,
   OptimizationLevel Level);
-  void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM,
-OptimizationLevel Level);
+  void invokePipelineEarlySimplificationEPCallbacks(
+  ModulePassManager &MPM, OptimizationLevel Level,
+  ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None);
 
   static bool checkParametrizedPassName(StringRef Name, StringRef PassName) {
 if (!Name.consume_front(PassName))
@@ -764,7 +766,9 @@ class PassBuilder {
   FullLinkTimeOptimizationLastEPCallbacks;
   SmallVector, 2>
   PipelineStartEPCallbacks;
-  SmallVector, 2>
+  SmallVector,
+  2>
   PipelineEarlySimplificationEPCallbacks;
 
   SmallVector, 2>
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 7c512ab15a6d38..bfb9678678f18a 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -384,9 +384,9 @@ void 
PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM,
 C(MPM, Level);
 }
 void PassBuilder::invokePipelineEarlySimplificationEPCallbacks(
-ModulePassManager &MPM, OptimizationLevel Level) {
+ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) 
{
   for (auto &C : PipelineEarlySimplificationEPCallbacks)
-C(MPM, Level);
+C(MPM, Level, Phase);
 }
 
 // Helper to add AnnotationRemarksPass.
@@ -1140,7 +1140,7 @@ 
PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 MPM.addPass(LowerTypeTestsPass(nullptr, nullptr,
lowertypetests::DropTestKind::Assume));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Interprocedural constant propagation now that basic cleanup has occurred
   // and prior to optimizing globals.
@@ -2155,7 +2155,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel 
Level,
   if (PGOOpt && PGOOpt->DebugInfoForProfiling)
 MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
 
-  invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
+  invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase);
 
   // Build a minimal pipeline based on the semantics require

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Matt Arsenault via llvm-branch-commits



@@ -821,8 +825,15 @@ void 
AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
   PM.addPass(AMDGPUSwLowerLDSPass(*this));
 if (EnableLowerModuleLDS)
   PM.addPass(AMDGPULowerModuleLDSPass(*this));
-if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0)
-  PM.addPass(AMDGPUAttributorPass(*this));
+if (Level != OptimizationLevel::O0) {
+  if (EnableAMDGPUAttributor)
+PM.addPass(AMDGPUAttributorPass(*this));
+  // Do we really need internalization in LTO?
+  if (InternalizeSymbols) {

arsenm wrote:

Do we need the custom internalize anymore? I thought this was because of 
mis-set visibility in the libraries, but that was fixed? 

https://github.com/llvm/llvm-project/pull/114547
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits



@@ -821,8 +825,15 @@ void 
AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
   PM.addPass(AMDGPUSwLowerLDSPass(*this));
 if (EnableLowerModuleLDS)
   PM.addPass(AMDGPULowerModuleLDSPass(*this));
-if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0)
-  PM.addPass(AMDGPUAttributorPass(*this));
+if (Level != OptimizationLevel::O0) {
+  if (EnableAMDGPUAttributor)
+PM.addPass(AMDGPUAttributorPass(*this));
+  // Do we really need internalization in LTO?
+  if (InternalizeSymbols) {

shiltian wrote:

Well, we probably still need that for those don't use LTO, such as `comgr`.

https://github.com/llvm/llvm-project/pull/114547
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (PR #114438)

2024-11-01 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/114438

>From 7181479ee055c0c8d15a674d577a9cd694e21621 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Thu, 31 Oct 2024 12:49:07 -0400
Subject: [PATCH] [WIP][AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor
 existing attribute

---
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 79 +++
 .../annotate-kernel-features-hsa-call.ll  | 46 +--
 .../AMDGPU/attributor-loop-issue-58639.ll |  3 +-
 .../CodeGen/AMDGPU/direct-indirect-call.ll|  3 +-
 .../CodeGen/AMDGPU/propagate-waves-per-eu.ll  | 59 +++---
 .../AMDGPU/remove-no-kernel-id-attribute.ll   |  9 ++-
 .../AMDGPU/uniform-work-group-multistep.ll|  3 +-
 .../uniform-work-group-recursion-test.ll  |  2 +-
 8 files changed, 111 insertions(+), 93 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 066003395af3f2..18b617d17bec5c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -198,6 +198,17 @@ class AMDGPUInformationCache : public InformationCache {
 return ST.getWavesPerEU(F, FlatWorkGroupSize);
   }
 
+  std::optional>
+  getWavesPerEUAttr(const Function &F) {
+auto Val = AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu",
+   /*OnlyFirstRequired=*/true);
+if (Val && Val->second == 0) {
+  const GCNSubtarget &ST = TM.getSubtarget(F);
+  Val->second = ST.getMaxWavesPerEU();
+}
+return Val;
+  }
+
   std::pair
   getEffectiveWavesPerEU(const Function &F,
  std::pair WavesPerEU,
@@ -768,22 +779,6 @@ struct AAAMDSizeRangeAttribute
/*ForceReplace=*/true);
   }
 
-  ChangeStatus emitAttributeIfNotDefault(Attributor &A, unsigned Min,
- unsigned Max) {
-// Don't add the attribute if it's the implied default.
-if (getAssumed().getLower() == Min && getAssumed().getUpper() - 1 == Max)
-  return ChangeStatus::UNCHANGED;
-
-Function *F = getAssociatedFunction();
-LLVMContext &Ctx = F->getContext();
-SmallString<10> Buffer;
-raw_svector_ostream OS(Buffer);
-OS << getAssumed().getLower() << ',' << getAssumed().getUpper() - 1;
-return A.manifestAttrs(getIRPosition(),
-   {Attribute::get(Ctx, AttrName, OS.str())},
-   /*ForceReplace=*/true);
-  }
-
   const std::string getAsStr(Attributor *) const override {
 std::string Str;
 raw_string_ostream OS(Str);
@@ -880,29 +875,47 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
   AAAMDWavesPerEU(const IRPosition &IRP, Attributor &A)
   : AAAMDSizeRangeAttribute(IRP, A, "amdgpu-waves-per-eu") {}
 
-  bool isValidState() const override {
-return !Assumed.isEmptySet() && IntegerRangeState::isValidState();
-  }
-
   void initialize(Attributor &A) override {
 Function *F = getAssociatedFunction();
 auto &InfoCache = static_cast(A.getInfoCache());
 
-if (const auto *AssumedGroupSize = A.getAAFor(
-*this, IRPosition::function(*F), DepClassTy::REQUIRED);
-AssumedGroupSize->isValidState()) {
+auto TakeRange = [&](std::pair R) {
+  auto [Min, Max] = R;
+  ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
+  IntegerRangeState RangeState(Range);
+  clampStateAndIndicateChange(this->getState(), RangeState);
+  indicateOptimisticFixpoint();
+};
 
-  unsigned Min, Max;
-  std::tie(Min, Max) = InfoCache.getWavesPerEU(
-  *F, {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-   AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
+std::pair MaxWavesPerEURange{
+1U, InfoCache.getMaxWavesPerEU(*F)};
 
-  ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
-  intersectKnown(Range);
+// If the attribute exists, we will honor it if it is not the default.
+if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) {
+  if (*Attr != MaxWavesPerEURange) {
+TakeRange(*Attr);
+return;
+  }
 }
 
-if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
-  indicatePessimisticFixpoint();
+// Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the
+// calculation of waves per EU involves flat work group size, we can't
+// simply use an assumed flat work group size as a start point, because the
+// update of flat work group size is in an inverse direction of waves per
+// EU. However, we can still do something if it is an entry function. Since
+// an entry function is a terminal node, and flat work group size either
+// from attribute or default will be used anyway, we can take that value 
and
+// calculate the waves per EU based on it. This result can't be updated by
+// no means, but that could still allow us

[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)

2024-11-01 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/114640

Backport 6ca816f88d5f0f2032d1610207023133eaf40a1e

Requested by: @owenca

>From 628477ce78cf2460ef3ec075494dcbbb67f8f7c8 Mon Sep 17 00:00:00 2001
From: Owen Pan 
Date: Fri, 1 Nov 2024 18:47:50 -0700
Subject: [PATCH] [clang-format] Fix a regression in parsing `switch` in macro
 call (#114506)

Fixes #114408.

(cherry picked from commit 6ca816f88d5f0f2032d1610207023133eaf40a1e)
---
 clang/lib/Format/UnwrappedLineParser.cpp  | 8 ++--
 clang/unittests/Format/TokenAnnotatorTest.cpp | 7 +++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Format/UnwrappedLineParser.cpp 
b/clang/lib/Format/UnwrappedLineParser.cpp
index a5268e153bcc5b..bfb592ae074938 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -2086,7 +2086,8 @@ void UnwrappedLineParser::parseStructuralElement(
 case tok::kw_switch:
   if (Style.Language == FormatStyle::LK_Java)
 parseSwitch(/*IsExpr=*/true);
-  nextToken();
+  else
+nextToken();
   break;
 case tok::kw_case:
   // Proto: there are no switch/case statements.
@@ -2637,7 +2638,10 @@ bool UnwrappedLineParser::parseParens(TokenType 
AmpAmpTokenType) {
 nextToken();
   break;
 case tok::kw_switch:
-  parseSwitch(/*IsExpr=*/true);
+  if (Style.Language == FormatStyle::LK_Java)
+parseSwitch(/*IsExpr=*/true);
+  else
+nextToken();
   break;
 case tok::kw_requires: {
   auto RequiresToken = FormatTok;
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp 
b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 4acd900ff061f8..07999116ab0cf0 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -3412,6 +3412,13 @@ TEST_F(TokenAnnotatorTest, TemplateInstantiation) {
   EXPECT_TOKEN(Tokens[18], tok::greater, TT_TemplateCloser);
 }
 
+TEST_F(TokenAnnotatorTest, SwitchInMacroArgument) {
+  auto Tokens = annotate("FOOBAR(switch);\n"
+ "void f() {}");
+  ASSERT_EQ(Tokens.size(), 12u) << Tokens;
+  EXPECT_TOKEN(Tokens[9], tok::l_brace, TT_FunctionLBrace);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)

2024-11-01 Thread via llvm-branch-commits


llvmbot wrote:

@HazardyKnusperkeks What do you think about merging this PR to the release 
branch?

https://github.com/llvm/llvm-project/pull/114640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)

2024-11-01 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/114640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)

2024-11-01 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang-format

Author: None (llvmbot)


Changes

Backport 6ca816f88d5f0f2032d1610207023133eaf40a1e

Requested by: @owenca

---
Full diff: https://github.com/llvm/llvm-project/pull/114640.diff


2 Files Affected:

- (modified) clang/lib/Format/UnwrappedLineParser.cpp (+6-2) 
- (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+7) 


``diff
diff --git a/clang/lib/Format/UnwrappedLineParser.cpp 
b/clang/lib/Format/UnwrappedLineParser.cpp
index a5268e153bcc5b..bfb592ae074938 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -2086,7 +2086,8 @@ void UnwrappedLineParser::parseStructuralElement(
 case tok::kw_switch:
   if (Style.Language == FormatStyle::LK_Java)
 parseSwitch(/*IsExpr=*/true);
-  nextToken();
+  else
+nextToken();
   break;
 case tok::kw_case:
   // Proto: there are no switch/case statements.
@@ -2637,7 +2638,10 @@ bool UnwrappedLineParser::parseParens(TokenType 
AmpAmpTokenType) {
 nextToken();
   break;
 case tok::kw_switch:
-  parseSwitch(/*IsExpr=*/true);
+  if (Style.Language == FormatStyle::LK_Java)
+parseSwitch(/*IsExpr=*/true);
+  else
+nextToken();
   break;
 case tok::kw_requires: {
   auto RequiresToken = FormatTok;
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp 
b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 4acd900ff061f8..07999116ab0cf0 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -3412,6 +3412,13 @@ TEST_F(TokenAnnotatorTest, TemplateInstantiation) {
   EXPECT_TOKEN(Tokens[18], tok::greater, TT_TemplateCloser);
 }
 
+TEST_F(TokenAnnotatorTest, SwitchInMacroArgument) {
+  auto Tokens = annotate("FOOBAR(switch);\n"
+ "void f() {}");
+  ASSERT_EQ(Tokens.size(), 12u) << Tokens;
+  EXPECT_TOKEN(Tokens[9], tok::l_brace, TT_FunctionLBrace);
+}
+
 } // namespace
 } // namespace format
 } // namespace clang

``




https://github.com/llvm/llvm-project/pull/114640
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)

2024-11-01 Thread Shilei Tian via llvm-branch-commits



@@ -821,8 +825,15 @@ void 
AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
   PM.addPass(AMDGPUSwLowerLDSPass(*this));
 if (EnableLowerModuleLDS)
   PM.addPass(AMDGPULowerModuleLDSPass(*this));
-if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0)
-  PM.addPass(AMDGPUAttributorPass(*this));
+if (Level != OptimizationLevel::O0) {
+  if (EnableAMDGPUAttributor)
+PM.addPass(AMDGPUAttributorPass(*this));
+  // Do we really need internalization in LTO?
+  if (InternalizeSymbols) {

shiltian wrote:

I'm not sure. I can start a PSDB session to test it out whether it makes 
difference after removing it.

https://github.com/llvm/llvm-project/pull/114547
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-11-01 Thread Pengcheng Wang via llvm-branch-commits



@@ -1144,42 +2872,116 @@ entry:
 define i32 @memcmp_size_4(ptr %s1, ptr %s2) nounwind {
 ; CHECK-ALIGNED-RV32-LABEL: memcmp_size_4:
 ; CHECK-ALIGNED-RV32:   # %bb.0: # %entry
-; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, -16
-; CHECK-ALIGNED-RV32-NEXT:sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK-ALIGNED-RV32-NEXT:li a2, 4
-; CHECK-ALIGNED-RV32-NEXT:call memcmp
-; CHECK-ALIGNED-RV32-NEXT:lw ra, 12(sp) # 4-byte Folded Reload
-; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, 16
+; CHECK-ALIGNED-RV32-NEXT:lbu a2, 0(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a3, 1(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a4, 3(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a0, 2(a0)
+; CHECK-ALIGNED-RV32-NEXT:lbu a5, 0(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a6, 1(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a7, 3(a1)
+; CHECK-ALIGNED-RV32-NEXT:lbu a1, 2(a1)
+; CHECK-ALIGNED-RV32-NEXT:slli a0, a0, 8
+; CHECK-ALIGNED-RV32-NEXT:or a0, a0, a4
+; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a2, a2, 24
+; CHECK-ALIGNED-RV32-NEXT:or a2, a2, a3
+; CHECK-ALIGNED-RV32-NEXT:or a0, a2, a0
+; CHECK-ALIGNED-RV32-NEXT:slli a1, a1, 8
+; CHECK-ALIGNED-RV32-NEXT:or a1, a1, a7
+; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 16
+; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 24
+; CHECK-ALIGNED-RV32-NEXT:or a2, a5, a6
+; CHECK-ALIGNED-RV32-NEXT:or a1, a2, a1
+; CHECK-ALIGNED-RV32-NEXT:sltu a2, a1, a0
+; CHECK-ALIGNED-RV32-NEXT:sltu a0, a0, a1
+; CHECK-ALIGNED-RV32-NEXT:sub a0, a2, a0
 ; CHECK-ALIGNED-RV32-NEXT:ret
 ;
 ; CHECK-ALIGNED-RV64-LABEL: memcmp_size_4:
 ; CHECK-ALIGNED-RV64:   # %bb.0: # %entry
-; CHECK-ALIGNED-RV64-NEXT:addi sp, sp, -16
-; CHECK-ALIGNED-RV64-NEXT:sd ra, 8(sp) # 8-byte Folded Spill
-; CHECK-ALIGNED-RV64-NEXT:li a2, 4
-; CHECK-ALIGNED-RV64-NEXT:call memcmp
-; CHECK-ALIGNED-RV64-NEXT:ld ra, 8(sp) # 8-byte Folded Reload
-; CHECK-ALIGNED-RV64-NEXT:addi sp, sp, 16
+; CHECK-ALIGNED-RV64-NEXT:lbu a2, 0(a0)
+; CHECK-ALIGNED-RV64-NEXT:lbu a3, 1(a0)
+; CHECK-ALIGNED-RV64-NEXT:lbu a4, 2(a0)
+; CHECK-ALIGNED-RV64-NEXT:lb a0, 3(a0)
+; CHECK-ALIGNED-RV64-NEXT:lbu a5, 0(a1)
+; CHECK-ALIGNED-RV64-NEXT:lbu a6, 1(a1)
+; CHECK-ALIGNED-RV64-NEXT:lbu a7, 2(a1)
+; CHECK-ALIGNED-RV64-NEXT:lb a1, 3(a1)
+; CHECK-ALIGNED-RV64-NEXT:andi a0, a0, 255
+; CHECK-ALIGNED-RV64-NEXT:slli a4, a4, 8
+; CHECK-ALIGNED-RV64-NEXT:or a0, a4, a0
+; CHECK-ALIGNED-RV64-NEXT:slli a3, a3, 16
+; CHECK-ALIGNED-RV64-NEXT:slliw a2, a2, 24
+; CHECK-ALIGNED-RV64-NEXT:or a2, a2, a3
+; CHECK-ALIGNED-RV64-NEXT:or a0, a2, a0
+; CHECK-ALIGNED-RV64-NEXT:andi a1, a1, 255
+; CHECK-ALIGNED-RV64-NEXT:slli a7, a7, 8
+; CHECK-ALIGNED-RV64-NEXT:or a1, a7, a1
+; CHECK-ALIGNED-RV64-NEXT:slli a6, a6, 16
+; CHECK-ALIGNED-RV64-NEXT:slliw a2, a5, 24
+; CHECK-ALIGNED-RV64-NEXT:or a2, a2, a6
+; CHECK-ALIGNED-RV64-NEXT:or a1, a2, a1
+; CHECK-ALIGNED-RV64-NEXT:sltu a2, a1, a0
+; CHECK-ALIGNED-RV64-NEXT:sltu a0, a0, a1
+; CHECK-ALIGNED-RV64-NEXT:sub a0, a2, a0
 ; CHECK-ALIGNED-RV64-NEXT:ret
 ;
 ; CHECK-UNALIGNED-RV32-LABEL: memcmp_size_4:
 ; CHECK-UNALIGNED-RV32:   # %bb.0: # %entry
-; CHECK-UNALIGNED-RV32-NEXT:addi sp, sp, -16
-; CHECK-UNALIGNED-RV32-NEXT:sw ra, 12(sp) # 4-byte Folded Spill
-; CHECK-UNALIGNED-RV32-NEXT:li a2, 4
-; CHECK-UNALIGNED-RV32-NEXT:call memcmp
-; CHECK-UNALIGNED-RV32-NEXT:lw ra, 12(sp) # 4-byte Folded Reload
-; CHECK-UNALIGNED-RV32-NEXT:addi sp, sp, 16
+; CHECK-UNALIGNED-RV32-NEXT:lw a0, 0(a0)

wangpc-pp wrote:

Here is the code of memcmp copied from glibc: https://godbolt.org/z/4KxPTE6q1
There are many cases (which means branches) in this general implementation, at 
least we can benefit from unrolling and removal of branches.

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [PAC][CodeGen][ELF][AArch64] Support signed GOT with tiny code model (PR #113812)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 updated 
https://github.com/llvm/llvm-project/pull/113812

>From c2ffa88c7b9f8e7a6b12cef59c83b288382c402b Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Sun, 27 Oct 2024 17:23:17 +0300
Subject: [PATCH] [PAC][CodeGen][ELF][AArch64] Support signed GOT with tiny
 code model

Support the following relocations and assembly operators:

- `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` (`:got_auth:` for `adr`)
- `R_AARCH64_AUTH_GOT_LD_PREL19` (`:got_auth:` for `ldr`)

`LOADgotAUTH` pseudo-instruction is expanded to actual instruction
sequence like the following.

```
adr x16, :got_auth:sym
ldr x0, [x16]
autia x0, x16
```

Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall
back to SelectionDAG.

Tests starting with 'ptrauth-' have corresponding variants w/o this prefix.
---
 llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp |  48 +++--
 .../AArch64/AsmParser/AArch64AsmParser.cpp|   8 +-
 .../MCTargetDesc/AArch64ELFObjectWriter.cpp   |  18 ++
 .../CodeGen/AArch64/ptrauth-extern-weak.ll|  42 
 .../CodeGen/AArch64/ptrauth-tiny-model-pic.ll | 182 ++
 .../AArch64/ptrauth-tiny-model-static.ll  | 157 +++
 llvm/test/MC/AArch64/arm64-elf-relocs.s   |  13 ++
 llvm/test/MC/AArch64/ilp32-diagnostics.s  |   6 +
 8 files changed, 455 insertions(+), 19 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/ptrauth-tiny-model-pic.ll
 create mode 100644 llvm/test/CodeGen/AArch64/ptrauth-tiny-model-static.ll

diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp 
b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index e79457f925db66..c2a7450ffb9132 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -2277,28 +2277,40 @@ void AArch64AsmPrinter::LowerLOADgotAUTH(const 
MachineInstr &MI) {
   const MachineOperand &GAMO = MI.getOperand(1);
   assert(GAMO.getOffset() == 0);
 
-  MachineOperand GAHiOp(GAMO);
-  MachineOperand GALoOp(GAMO);
-  GAHiOp.addTargetFlag(AArch64II::MO_PAGE);
-  GALoOp.addTargetFlag(AArch64II::MO_PAGEOFF | AArch64II::MO_NC);
+  if (MI.getParent()->getParent()->getTarget().getCodeModel() ==
+  CodeModel::Tiny) {
+MCOperand GAMC;
+MCInstLowering.lowerOperand(GAMO, GAMC);
+EmitToStreamer(
+MCInstBuilder(AArch64::ADR).addReg(AArch64::X17).addOperand(GAMC));
+EmitToStreamer(MCInstBuilder(AArch64::LDRXui)
+   .addReg(AuthResultReg)
+   .addReg(AArch64::X17)
+   .addImm(0));
+  } else {
+MachineOperand GAHiOp(GAMO);
+MachineOperand GALoOp(GAMO);
+GAHiOp.addTargetFlag(AArch64II::MO_PAGE);
+GALoOp.addTargetFlag(AArch64II::MO_PAGEOFF | AArch64II::MO_NC);
 
-  MCOperand GAMCHi, GAMCLo;
-  MCInstLowering.lowerOperand(GAHiOp, GAMCHi);
-  MCInstLowering.lowerOperand(GALoOp, GAMCLo);
+MCOperand GAMCHi, GAMCLo;
+MCInstLowering.lowerOperand(GAHiOp, GAMCHi);
+MCInstLowering.lowerOperand(GALoOp, GAMCLo);
 
-  EmitToStreamer(
-  MCInstBuilder(AArch64::ADRP).addReg(AArch64::X17).addOperand(GAMCHi));
+EmitToStreamer(
+MCInstBuilder(AArch64::ADRP).addReg(AArch64::X17).addOperand(GAMCHi));
 
-  EmitToStreamer(MCInstBuilder(AArch64::ADDXri)
- .addReg(AArch64::X17)
- .addReg(AArch64::X17)
- .addOperand(GAMCLo)
- .addImm(0));
+EmitToStreamer(MCInstBuilder(AArch64::ADDXri)
+   .addReg(AArch64::X17)
+   .addReg(AArch64::X17)
+   .addOperand(GAMCLo)
+   .addImm(0));
 
-  EmitToStreamer(MCInstBuilder(AArch64::LDRXui)
- .addReg(AuthResultReg)
- .addReg(AArch64::X17)
- .addImm(0));
+EmitToStreamer(MCInstBuilder(AArch64::LDRXui)
+   .addReg(AuthResultReg)
+   .addReg(AArch64::X17)
+   .addImm(0));
+  }
 
   assert(GAMO.isGlobal());
   MCSymbol *UndefWeakSym;
diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp 
b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index b83ca3f7e52db4..de8e0a4731e419 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -3353,7 +3353,13 @@ ParseStatus 
AArch64AsmParser::tryParseAdrLabel(OperandVector &Operands) {
   // No modifier was specified at all; this is the syntax for an ELF basic
   // ADR relocation (unfortunately).
   Expr = AArch64MCExpr::create(Expr, AArch64MCExpr::VK_ABS, getContext());
-} else {
+} else if (ELFRefKind != AArch64MCExpr::VK_GOT_AUTH_PAGE) {
+  // For tiny code model, we use :got_auth: operator to fill 21-bit imm of
+  // adr. It's not actually GOT entry page address but the GOT address
+  // itself - we just share the same variant kind with :got_auth: operator
+  // applied for adrp.
+

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits



@@ -1355,6 +1355,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr 
expr, RelType type,
 return 1;
   }
 
+  auto fatalBothAuthAndNonAuth = [&sym]() {
+fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() +
+  "' requested, but only one type of TLSDESC entry per symbol is "
+  "supported");
+  };
+
+  // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE.
+  // 
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions
+  // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC).
+  if (oneof(
+  expr)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+if (!sym.hasFlag(NEEDS_TLSDESC))
+  sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH);
+else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH))
+  fatalBothAuthAndNonAuth();
+sec->addReloc({expr, type, offset, addend, &sym});
+return 1;
+  }
+
+  if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+// TLSDESC_CALL hint relocation probably should not be emitted by compiler
+// with signed TLSDESC enabled since it does not give any value, but leave 
a
+// check against that just in case someone uses it.
+if (expr != R_TLSDESC_CALL)
+  fatalBothAuthAndNonAuth();

kovdan01 wrote:

The logic of this code and code above is the following. We check rel expr 
against AUTH variants.

1. If yes, the symbol should either already have both `NEEDS_TLSDESC` and 
`NEEDS_TLSDESC_AUTH` or have none of them. The symbol having only 
`NEEDS_TLSDESC` means that a non-auth entry was requested previously, and now 
we are requesting an auth one - which is currently not supported.

2. If no, but `NEEDS_TLSDESC_AUTH` flag was already set previously, it means 
that auth entry was requested previously, and non-auth one is requested now, 
which is currently not supported. The only case when we don't emit an error is 
expr equal to `R_TLSDESC_CALL` - this is a hint reloc which does not result in 
a non-auth entry by itself. See comment right above this if statement.

https://github.com/llvm/llvm-project/pull/113817
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] ValueTracking: Allow getUnderlyingObject to look at vectors (PR #114311)

2024-11-01 Thread Nikita Popov via llvm-branch-commits


https://github.com/nikic approved this pull request.

A tentative LGTM. I *think* this particular change is fine, but it's a 
dangerous area because all of AA basically does not support pointers of vectors 
at all and treats them as escapes. Wouldn't surprise me if this causes a 
miscompile.

https://github.com/llvm/llvm-project/pull/114311
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits



@@ -78,6 +78,79 @@ _start:
   adrp x1, :got_auth:zed
   add  x1, x1, :got_auth_lo12:zed
 
+#--- ok-tiny.s
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux ok-tiny.s -o ok-tiny.o
+
+# RUN: ld.lld ok-tiny.o a.so -pie -o external-tiny
+# RUN: llvm-readelf -r -S -x .got external-tiny | FileCheck %s 
--check-prefix=EXTERNAL-TINY
+
+# RUN: ld.lld ok-tiny.o a.o -pie -o local-tiny
+# RUN: llvm-readelf -r -S -x .got -s local-tiny | FileCheck %s 
--check-prefix=LOCAL-TINY
+
+# EXTERNAL-TINY:  OffsetInfo Type  
  Symbol's Value   Symbol's Name + Addend
+# EXTERNAL-TINY-NEXT: 00020380  0001e201 
R_AARCH64_AUTH_GLOB_DAT  bar + 0
+# EXTERNAL-TINY-NEXT: 00020388  0002e201 
R_AARCH64_AUTH_GLOB_DAT  zed + 0
+
+## Symbol's values for bar and zed are equal since they contain no content 
(see Inputs/shared.s)
+# LOCAL-TINY: OffsetInfo Type  
  Symbol's Value   Symbol's Name + Addend
+# LOCAL-TINY-NEXT:00020320  0411 
R_AARCH64_AUTH_RELATIVE 10260
+# LOCAL-TINY-NEXT:00020328  0411 
R_AARCH64_AUTH_RELATIVE 10260
+
+# EXTERNAL-TINY:  Hex dump of section '.got':
+# EXTERNAL-TINY-NEXT: 0x00020380  0080  00a0
+#   ^^
+#   0b1000 bit 63 address 
diversity = true, bits 61..60 key = IA
+# ^^
+# 0b1010 
bit 63 address diversity = true, bits 61..60 key = DA

kovdan01 wrote:

This is not output to be checked and matched but comments helping to understand 
the contents of hex dump. I've changed the prefix to `##` so it's clear that 
it's a comment and not a special line like RUN/CHECK/etc. See 
e841e190df73a5cbc6639cb40c467623f1b953ac

https://github.com/llvm/llvm-project/pull/113816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 edited 
https://github.com/llvm/llvm-project/pull/113816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 updated 
https://github.com/llvm/llvm-project/pull/113817

>From d89a47e22f427f8fe989ca24c9289821c8bda09d Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Fri, 25 Oct 2024 12:32:27 +0300
Subject: [PATCH 1/2] [PAC][lld][AArch64][ELF] Support signed TLSDESC

Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12`
and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static TLSDESC relocations.
---
 lld/ELF/Arch/AArch64.cpp |   8 ++
 lld/ELF/InputSection.cpp |   2 +
 lld/ELF/Relocations.cpp  |  38 +++-
 lld/ELF/Relocations.h|   4 +
 lld/ELF/Symbols.h|   1 +
 lld/ELF/SyntheticSections.cpp|   5 +
 lld/test/ELF/aarch64-tlsdesc-pauth.s | 134 +++
 7 files changed, 190 insertions(+), 2 deletions(-)
 create mode 100644 lld/test/ELF/aarch64-tlsdesc-pauth.s

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 86f509f3fd78a7..8ad466bf49878b 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -157,9 +157,14 @@ RelExpr AArch64::getRelExpr(RelType type, const Symbol &s,
 return R_AARCH64_AUTH;
   case R_AARCH64_TLSDESC_ADR_PAGE21:
 return R_AARCH64_TLSDESC_PAGE;
+  case R_AARCH64_AUTH_TLSDESC_ADR_PAGE21:
+return R_AARCH64_AUTH_TLSDESC_PAGE;
   case R_AARCH64_TLSDESC_LD64_LO12:
   case R_AARCH64_TLSDESC_ADD_LO12:
 return R_TLSDESC;
+  case R_AARCH64_AUTH_TLSDESC_LD64_LO12:
+  case R_AARCH64_AUTH_TLSDESC_ADD_LO12:
+return RelExpr::R_AARCH64_AUTH_TLSDESC;
   case R_AARCH64_TLSDESC_CALL:
 return R_TLSDESC_CALL;
   case R_AARCH64_TLSLE_ADD_TPREL_HI12:
@@ -543,6 +548,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel,
   case R_AARCH64_ADR_PREL_PG_HI21:
   case R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21:
   case R_AARCH64_TLSDESC_ADR_PAGE21:
+  case R_AARCH64_AUTH_TLSDESC_ADR_PAGE21:
 checkInt(ctx, loc, val, 33, rel);
 [[fallthrough]];
   case R_AARCH64_ADR_PREL_PG_HI21_NC:
@@ -593,6 +599,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel,
   case R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC:
   case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC:
   case R_AARCH64_TLSDESC_LD64_LO12:
+  case R_AARCH64_AUTH_TLSDESC_LD64_LO12:
 checkAlignment(ctx, loc, val, 8, rel);
 write32Imm12(loc, getBits(val, 3, 11));
 break;
@@ -667,6 +674,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel,
 break;
   case R_AARCH64_TLSLE_ADD_TPREL_LO12_NC:
   case R_AARCH64_TLSDESC_ADD_LO12:
+  case R_AARCH64_AUTH_TLSDESC_ADD_LO12:
 write32Imm12(loc, val);
 break;
   case R_AARCH64_TLSDESC:
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index ccc7cf8c6e2de9..b3303c59a3b4a5 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -935,12 +935,14 @@ uint64_t InputSectionBase::getRelocTargetVA(Ctx &ctx, 
const Relocation &r,
   case R_SIZE:
 return r.sym->getSize() + a;
   case R_TLSDESC:
+  case RelExpr::R_AARCH64_AUTH_TLSDESC:
 return ctx.in.got->getTlsDescAddr(*r.sym) + a;
   case R_TLSDESC_PC:
 return ctx.in.got->getTlsDescAddr(*r.sym) + a - p;
   case R_TLSDESC_GOTPLT:
 return ctx.in.got->getTlsDescAddr(*r.sym) + a - ctx.in.gotPlt->getVA();
   case R_AARCH64_TLSDESC_PAGE:
+  case R_AARCH64_AUTH_TLSDESC_PAGE:
 return getAArch64Page(ctx.in.got->getTlsDescAddr(*r.sym) + a) -
getAArch64Page(p);
   case R_LOONGARCH_TLSDESC_PAGE_PC:
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index dbe0bcfcdc34f6..f53406cbf63566 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -1352,6 +1352,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr 
expr, RelType type,
 return 1;
   }
 
+  auto fatalBothAuthAndNonAuth = [&sym]() {
+fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() +
+  "' requested, but only one type of TLSDESC entry per symbol is "
+  "supported");
+  };
+
+  // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE.
+  // 
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions
+  // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC).
+  if (oneof(
+  expr)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+if (!sym.hasFlag(NEEDS_TLSDESC))
+  sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH);
+else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH))
+  fatalBothAuthAndNonAuth();
+sec->addReloc({expr, type, offset, addend, &sym});
+return 1;
+  }
+
+  if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) {
+assert(ctx.arg.emachine == EM_AARCH64);
+// TLSDESC_CALL hint relocation probably should not be emitted by compiler
+// with signed TLSDESC enabled since it does not give any value, but leave 
a
+// check against that just in case someone uses it.
+if (expr != R_TLSDESC_CALL)
+  fatalBothAuthAndNonAuth();
+return 1;
+  }
+
   bool isRISCV = ctx.arg.emachine

[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 updated 
https://github.com/llvm/llvm-project/pull/113816

>From 4b1795d57490dbcef1cf7ce17739a0d6023e5cca Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Fri, 25 Oct 2024 21:28:18 +0300
Subject: [PATCH 1/2] [PAC][lld][AArch64][ELF] Support signed GOT with tiny
 code model

Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19`
GOT-generating relocations.
---
 lld/ELF/Arch/AArch64.cpp |  5 ++
 lld/ELF/InputSection.cpp |  1 +
 lld/ELF/Relocations.cpp  | 17 ++---
 lld/ELF/Relocations.h|  1 +
 lld/test/ELF/aarch64-got-relocations-pauth.s | 73 
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 86f509f3fd78a7..2f2e0c2a52b0ef 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -205,6 +205,9 @@ RelExpr AArch64::getRelExpr(RelType type, const Symbol &s,
   case R_AARCH64_AUTH_LD64_GOT_LO12_NC:
   case R_AARCH64_AUTH_GOT_ADD_LO12_NC:
 return R_AARCH64_AUTH_GOT;
+  case R_AARCH64_AUTH_GOT_LD_PREL19:
+  case R_AARCH64_AUTH_GOT_ADR_PREL_LO21:
+return R_AARCH64_AUTH_GOT_PC;
   case R_AARCH64_LD64_GOTPAGE_LO15:
 return R_AARCH64_GOT_PAGE;
   case R_AARCH64_ADR_GOT_PAGE:
@@ -549,6 +552,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel,
 write32AArch64Addr(loc, val >> 12);
 break;
   case R_AARCH64_ADR_PREL_LO21:
+  case R_AARCH64_AUTH_GOT_ADR_PREL_LO21:
 checkInt(ctx, loc, val, 21, rel);
 write32AArch64Addr(loc, val);
 break;
@@ -569,6 +573,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel,
   case R_AARCH64_CONDBR19:
   case R_AARCH64_LD_PREL_LO19:
   case R_AARCH64_GOT_LD_PREL19:
+  case R_AARCH64_AUTH_GOT_LD_PREL19:
 checkAlignment(ctx, loc, val, 4, rel);
 checkInt(ctx, loc, val, 21, rel);
 writeMaskedBits32le(loc, (val & 0x1C) << 3, 0x1C << 3);
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index ccc7cf8c6e2de9..ba135afd3580bf 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -788,6 +788,7 @@ uint64_t InputSectionBase::getRelocTargetVA(Ctx &ctx, const 
Relocation &r,
   case R_AARCH64_GOT_PAGE:
 return r.sym->getGotVA(ctx) + a - getAArch64Page(ctx.in.got->getVA());
   case R_GOT_PC:
+  case R_AARCH64_AUTH_GOT_PC:
   case R_RELAX_TLS_GD_TO_IE:
 return r.sym->getGotVA(ctx) + a - p;
   case R_GOTPLT_GOTREL:
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index dbe0bcfcdc34f6..2e679834add158 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -210,11 +210,11 @@ static bool needsPlt(RelExpr expr) {
 }
 
 bool lld::elf::needsGot(RelExpr expr) {
-  return oneof(
-  expr);
+  return oneof(expr);
 }
 
 // True if this expression is of the form Sym - X, where X is a position in the
@@ -1011,8 +1011,8 @@ bool RelocationScanner::isStaticLinkTimeConstant(RelExpr 
e, RelType type,
 R_GOTONLY_PC, R_GOTPLTONLY_PC, R_PLT_PC, R_PLT_GOTREL, 
R_PLT_GOTPLT,
 R_GOTPLT_GOTREL, R_GOTPLT_PC, R_PPC32_PLTREL, R_PPC64_CALL_PLT,
 R_PPC64_RELAX_TOC, R_RISCV_ADD, R_AARCH64_GOT_PAGE,
-R_AARCH64_AUTH_GOT, R_LOONGARCH_PLT_PAGE_PC, R_LOONGARCH_GOT,
-R_LOONGARCH_GOT_PAGE_PC>(e))
+R_AARCH64_AUTH_GOT, R_AARCH64_AUTH_GOT_PC, R_LOONGARCH_PLT_PAGE_PC,
+R_LOONGARCH_GOT, R_LOONGARCH_GOT_PAGE_PC>(e))
 return true;
 
   // These never do, except if the entire file is position dependent or if
@@ -1126,7 +1126,8 @@ void RelocationScanner::processAux(RelExpr expr, RelType 
type, uint64_t offset,
   // Many LoongArch TLS relocs reuse the R_LOONGARCH_GOT type, in which
   // case the NEEDS_GOT flag shouldn't get set.
   bool needsGotAuth =
-  (expr == R_AARCH64_AUTH_GOT || expr == R_AARCH64_AUTH_GOT_PAGE_PC);
+  (expr == R_AARCH64_AUTH_GOT || expr == R_AARCH64_AUTH_GOT_PC ||
+   expr == R_AARCH64_AUTH_GOT_PAGE_PC);
   uint16_t flags = sym.flags.load(std::memory_order_relaxed);
   if (!(flags & NEEDS_GOT)) {
 sym.setFlags(needsGotAuth ? (NEEDS_GOT | NEEDS_GOT_AUTH) : NEEDS_GOT);
diff --git a/lld/ELF/Relocations.h b/lld/ELF/Relocations.h
index 20d88de402ac18..38d55d46116569 100644
--- a/lld/ELF/Relocations.h
+++ b/lld/ELF/Relocations.h
@@ -89,6 +89,7 @@ enum RelExpr {
   R_AARCH64_AUTH_GOT_PAGE_PC,
   R_AARCH64_GOT_PAGE,
   R_AARCH64_AUTH_GOT,
+  R_AARCH64_AUTH_GOT_PC,
   R_AARCH64_PAGE_PC,
   R_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC,
   R_AARCH64_TLSDESC_PAGE,
diff --git a/lld/test/ELF/aarch64-got-relocations-pauth.s 
b/lld/test/ELF/aarch64-got-relocations-pauth.s
index ef089b61b6771c..c6cfd0c18b15f9 100644
--- a/lld/test/ELF/aarch64-got-relocations-pauth.s
+++ b/lld/test/ELF/aarch64-got-relocations-pauth.s
@@ -78,6 +78,79 @@ _start:
   adrp x1, :got_auth:zed
   add  x1, x1, :got_auth_lo12:zed
 
+#--- ok-tiny.s
+
+# RUN: ll

[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-01 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)


Changes



---

Patch is 404.53 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/114517.diff


4 Files Affected:

- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+100-3) 
- (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+5) 
- (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+920-530) 
- (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4570-1843) 


``diff
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp 
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 3b3f8772a08940..89b4f22a1260db 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -23,6 +23,7 @@
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/VectorUtils.h"
+#include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
@@ -14474,17 +14475,116 @@ static bool narrowIndex(SDValue &N, 
ISD::MemIndexType IndexType, SelectionDAG &D
   return true;
 }
 
+/// Recursive helper for combineVectorSizedSetCCEquality() to see if we have a
+/// recognizable memcmp expansion.
+static bool isOrXorXorTree(SDValue X, bool Root = true) {
+  if (X.getOpcode() == ISD::OR)
+return isOrXorXorTree(X.getOperand(0), false) &&
+   isOrXorXorTree(X.getOperand(1), false);
+  if (Root)
+return false;
+  return X.getOpcode() == ISD::XOR;
+}
+
+/// Recursive helper for combineVectorSizedSetCCEquality() to emit the memcmp
+/// expansion.
+static SDValue emitOrXorXorTree(SDValue X, const SDLoc &DL, SelectionDAG &DAG,
+EVT VecVT, EVT CmpVT) {
+  SDValue Op0 = X.getOperand(0);
+  SDValue Op1 = X.getOperand(1);
+  if (X.getOpcode() == ISD::OR) {
+SDValue A = emitOrXorXorTree(Op0, DL, DAG, VecVT, CmpVT);
+SDValue B = emitOrXorXorTree(Op1, DL, DAG, VecVT, CmpVT);
+if (VecVT != CmpVT)
+  return DAG.getNode(ISD::OR, DL, CmpVT, A, B);
+return DAG.getNode(ISD::AND, DL, CmpVT, A, B);
+  }
+  if (X.getOpcode() == ISD::XOR) {
+SDValue A = DAG.getBitcast(VecVT, Op0);
+SDValue B = DAG.getBitcast(VecVT, Op1);
+if (VecVT != CmpVT)
+  return DAG.getSetCC(DL, CmpVT, A, B, ISD::SETNE);
+return DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ);
+  }
+  llvm_unreachable("Impossible");
+}
+
+/// Try to map a 128-bit or larger integer comparison to vector instructions
+/// before type legalization splits it up into chunks.
+static SDValue
+combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC,
+const SDLoc &DL, SelectionDAG &DAG,
+const RISCVSubtarget &Subtarget) {
+  assert((CC == ISD::SETNE || CC == ISD::SETEQ) && "Bad comparison predicate");
+
+  EVT OpVT = X.getValueType();
+  MVT XLenVT = Subtarget.getXLenVT();
+  unsigned OpSize = OpVT.getSizeInBits();
+
+  // We're looking for an oversized integer equality comparison.
+  if (!Subtarget.hasVInstructions() || !OpVT.isScalarInteger() ||
+  OpSize < Subtarget.getRealMinVLen() ||
+  OpSize > Subtarget.getRealMinVLen() * 8)
+return SDValue();
+
+  bool IsOrXorXorTreeCCZero = isNullConstant(Y) && isOrXorXorTree(X);
+  if (isNullConstant(Y) && !IsOrXorXorTreeCCZero)
+return SDValue();
+
+  // Don't perform this combine if constructing the vector will be expensive.
+  auto IsVectorBitCastCheap = [](SDValue X) {
+X = peekThroughBitcasts(X);
+return isa(X) || X.getValueType().isVector() ||
+   X.getOpcode() == ISD::LOAD;
+  };
+  if ((!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y)) &&
+  !IsOrXorXorTreeCCZero)
+return SDValue();
+
+  bool NoImplicitFloatOps =
+  DAG.getMachineFunction().getFunction().hasFnAttribute(
+  Attribute::NoImplicitFloat);
+  if (!NoImplicitFloatOps && Subtarget.hasVInstructions()) {
+unsigned VecSize = OpSize / 8;
+EVT VecVT = MVT::getVectorVT(MVT::i8, VecSize);
+EVT CmpVT = MVT::getVectorVT(MVT::i1, VecSize);
+
+SDValue Cmp;
+if (IsOrXorXorTreeCCZero) {
+  Cmp = emitOrXorXorTree(X, DL, DAG, VecVT, CmpVT);
+} else {
+  SDValue VecX = DAG.getBitcast(VecVT, X);
+  SDValue VecY = DAG.getBitcast(VecVT, Y);
+  Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ);
+}
+return DAG.getSetCC(DL, VT,
+DAG.getNode(ISD::VECREDUCE_AND, DL, XLenVT, Cmp),
+DAG.getConstant(0, DL, XLenVT), CC);
+  }
+
+  return SDValue();
+}
+
 // Replace (seteq (i64 (and X, 0x)), C1) with
 // (seteq (i64 (sext_inreg (X, i32)), C1')) where C1' is C1 sign extended from
 // bit 31. Same for setne. C1' may be cheaper to materialize and the sext_inreg
 // can become a sext.w instead of a shift pair.
 static SDValue performSETCCCombine(SDNode *N, Selecti

[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-01 Thread Pengcheng Wang via llvm-branch-commits


https://github.com/wangpc-pp created 
https://github.com/llvm/llvm-project/pull/114517

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld] Use braa instr in PAC PLT sequence with valid PAuth core info (PR #113945)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 updated 
https://github.com/llvm/llvm-project/pull/113945

>From f2daf75b8506e31180f2d41291c6f1a63da5138b Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Mon, 28 Oct 2024 21:23:54 +0300
Subject: [PATCH 1/2] [PAC][lld] Use braa instr in PAC PLT sequence with valid
 PAuth core info

Assume PAC instructions being supported with PAuth core info different
from (0,0). Given that, `autia1716; br x17` can be replaced with
`braa x17, x16; nop`.
---
 lld/ELF/Arch/AArch64.cpp | 19 +++
 lld/test/ELF/aarch64-feature-pauth.s | 10 ++
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 260307ac4c3dcb..c76f226bc5511c 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -999,7 +999,9 @@ class AArch64BtiPac final : public AArch64 {
 
 private:
   bool btiHeader; // bti instruction needed in PLT Header and Entry
-  bool pacEntry;  // autia1716 instruction needed in PLT Entry
+  bool pacEntry;  // Authenticated branch needed in PLT Entry
+  bool pacUseHint =
+  true; // Use hint space instructions for authenticated branch in PLT 
entry
 };
 } // namespace
 
@@ -1016,6 +1018,10 @@ AArch64BtiPac::AArch64BtiPac(Ctx &ctx) : AArch64(ctx) {
   // from properties in the objects, so we use the command line flag.
   pacEntry = ctx.arg.zPacPlt;
 
+  if (llvm::any_of(ctx.aarch64PauthAbiCoreInfo,
+   [](uint8_t c) { return c != 0; }))
+pacUseHint = false;
+
   if (btiHeader || pacEntry) {
 pltEntrySize = 24;
 ipltEntrySize = 24;
@@ -1066,9 +1072,13 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol 
&sym,
   0x11, 0x02, 0x40, 0xf9,  // ldr  x17, [x16, Offset(&(.got.plt[n]))]
   0x10, 0x02, 0x00, 0x91   // add  x16, x16, Offset(&(.got.plt[n]))
   };
+  const uint8_t pacHintBr[] = {
+  0x9f, 0x21, 0x03, 0xd5, // autia1716
+  0x20, 0x02, 0x1f, 0xd6  // br   x17
+  };
   const uint8_t pacBr[] = {
-  0x9f, 0x21, 0x03, 0xd5,  // autia1716
-  0x20, 0x02, 0x1f, 0xd6   // br   x17
+  0x30, 0x0a, 0x1f, 0xd7, // braa x17, x16
+  0x1f, 0x20, 0x03, 0xd5  // nop
   };
   const uint8_t stdBr[] = {
   0x20, 0x02, 0x1f, 0xd6,  // br   x17
@@ -1097,7 +1107,8 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol 
&sym,
   relocateNoSym(buf + 8, R_AARCH64_ADD_ABS_LO12_NC, gotPltEntryAddr);
 
   if (pacEntry)
-memcpy(buf + sizeof(addrInst), pacBr, sizeof(pacBr));
+memcpy(buf + sizeof(addrInst), (pacUseHint ? pacHintBr : pacBr),
+   sizeof(pacUseHint ? pacHintBr : pacBr));
   else
 memcpy(buf + sizeof(addrInst), stdBr, sizeof(stdBr));
   if (!hasBti)
diff --git a/lld/test/ELF/aarch64-feature-pauth.s 
b/lld/test/ELF/aarch64-feature-pauth.s
index c11073dba86f24..34f2f2698a26b8 100644
--- a/lld/test/ELF/aarch64-feature-pauth.s
+++ b/lld/test/ELF/aarch64-feature-pauth.s
@@ -56,8 +56,8 @@
 
 # PACPLTTAG:  0x7003 (AARCH64_PAC_PLT)
 
-# RUN: llvm-objdump -d pacplt-nowarn | FileCheck --check-prefix PACPLT 
-DA=10380 -DB=478 -DC=480 %s
-# RUN: llvm-objdump -d pacplt-warn   | FileCheck --check-prefix PACPLT 
-DA=10390 -DB=488 -DC=490 %s
+# RUN: llvm-objdump -d pacplt-nowarn | FileCheck 
--check-prefixes=PACPLT,NOHINT -DA=10380 -DB=478 -DC=480 %s
+# RUN: llvm-objdump -d pacplt-warn   | FileCheck --check-prefixes=PACPLT,HINT  
 -DA=10390 -DB=488 -DC=490 %s
 
 # PACPLT: Disassembly of section .text:
 # PACPLT:  :
@@ -77,8 +77,10 @@
 # PACPLT-NEXT: adrpx16, 0x3 
 # PACPLT-NEXT: ldr x17, [x16, #0x[[C]]]
 # PACPLT-NEXT: add x16, x16, #0x[[C]]
-# PACPLT-NEXT: autia1716
-# PACPLT-NEXT: br  x17
+# NOHINT-NEXT: braax17, x16
+# NOHINT-NEXT: nop
+# HINT-NEXT:   autia1716
+# HINT-NEXT:   br  x17
 # PACPLT-NEXT: nop
 
 #--- abi-tag-short.s

>From 026d7ca30ba8a9a0e1c0242c3e2635c0c76e4500 Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Fri, 1 Nov 2024 14:20:44 +0300
Subject: [PATCH 2/2] Address review comments

---
 lld/ELF/Arch/AArch64.cpp | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index c76f226bc5511c..e33971ea5d2499 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -999,9 +999,11 @@ class AArch64BtiPac final : public AArch64 {
 
 private:
   bool btiHeader; // bti instruction needed in PLT Header and Entry
-  bool pacEntry;  // Authenticated branch needed in PLT Entry
-  bool pacUseHint =
-  true; // Use hint space instructions for authenticated branch in PLT 
entry
+  enum {
+PEK_NoAuth,
+PEK_AuthHint, // use autia1716 instr for authenticated branch in PLT entry
+PEK_Auth, // use braa instr for authenticated branch in PLT entry
+  } pacEntryKind;
 };
 } // namespace
 
@@ -1016,13 +1018,18 @@ AArch64BtiPac::AArch64BtiPac(Ctx &ctx) : AArch64(ctx) {
   // relocations.
   // The PAC PLT en

[llvm-branch-commits] [lld] [PAC][lld] Use braa instr in PAC PLT sequence with valid PAuth core info (PR #113945)

2024-11-01 Thread Daniil Kovalev via llvm-branch-commits



@@ -999,7 +999,9 @@ class AArch64BtiPac final : public AArch64 {
 
 private:
   bool btiHeader; // bti instruction needed in PLT Header and Entry
-  bool pacEntry;  // autia1716 instruction needed in PLT Entry
+  bool pacEntry;  // Authenticated branch needed in PLT Entry

kovdan01 wrote:

Changed to an enum, thanks! It might be worth using a switch statement instead 
of if's and ternary operators at the end of `AArch64BtiPac::writePlt`, but it 
looks like that it's readable enough right now, so I left that "as is" unless 
there is a request for changing that as well.

https://github.com/llvm/llvm-project/pull/113945
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)

2024-11-01 Thread Sander de Smalen via llvm-branch-commits



@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const 
MachineFunction &MF,
   return {};
 }
 
+static SmallVector ReservedHi = {

sdesmalen-arm wrote:

I don't think there is a bug; the code for moving an instruction goes through 
the list of operands to update the register's liverange. For each physreg it 
then goes through the regunits to calculate/update the liverange for that 
regunit, but only if the regunit is not reserved.

The code that determines if the register is reserved says:
```
// A register unit is considered reserved if all its roots and all their
// super registers are reserved.
```
Without this change to AArch64RegisterInfo.cpp, WZR and XZR are marked as 
reserved, but WZR_HI isn't (because WZR_HI is a sibling of WZR, and 
`markSuperRegs` marks only XZR as reserved), and so `IsReserved` is `false` for 
the WZR_HI regunit.

Why this doesn't fail for AMDGPU I don't know, perhaps these registers are 
always virtual and they never go down this path.

https://github.com/llvm/llvm-project/pull/114263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

43 matches

Mail list logo