[llvm-branch-commits] [BOLT] Encode landing pads in BAT (PR #114602)
llvmbot wrote: @llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) Changes Reuse secondary entry points vector and include landing pad offsets. Use LSB to encode LPENTRY bit, similar to BRANCHENTRY bit used to distinguish branch and block entries in the address map. Test Plan: updated bolt-address-translation.test --- Full diff: https://github.com/llvm/llvm-project/pull/114602.diff 4 Files Affected: - (modified) bolt/docs/BAT.md (+8-4) - (modified) bolt/include/bolt/Profile/BoltAddressTranslation.h (+3) - (modified) bolt/lib/Profile/BoltAddressTranslation.cpp (+41-29) - (modified) bolt/test/X86/callcont-fallthru.s (+7-1) ``diff diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md index 817ad288aa34ba..3b42c36541acd3 100644 --- a/bolt/docs/BAT.md +++ b/bolt/docs/BAT.md @@ -54,7 +54,7 @@ Functions table: | table | | | | Secondary entry | -| points | +| points and LPs | |--| ``` @@ -80,7 +80,7 @@ Hot indices are delta encoded, implicitly starting at zero. | `HotIndex` | Delta, ULEB128 | Index of corresponding hot function in hot functions table | Cold | | `FuncHash` | 8b | Function hash for input function | Hot | | `NumBlocks` | ULEB128 | Number of basic blocks in the original function | Hot | -| `NumSecEntryPoints` | ULEB128 | Number of secondary entry points in the original function | Hot | +| `NumSecEntryPoints` | ULEB128 | Number of secondary entry points and landing pads in the original function | Hot | | `ColdInputSkew` | ULEB128 | Skew to apply to all input offsets | Cold | | `NumEntries` | ULEB128 | Number of address translation entries for a function | Both | | `EqualElems` | ULEB128 | Number of equal offsets in the beginning of a function | Both | @@ -116,7 +116,11 @@ input basic block mapping. ### Secondary Entry Points table The table is emitted for hot fragments only. It contains `NumSecEntryPoints` -offsets denoting secondary entry points, delta encoded, implicitly starting at zero. +offsets denoting secondary entry points and landing pads, delta encoded, +implicitly starting at zero. | Entry | Encoding | Description | | - | | --- | -| `SecEntryPoint` | Delta, ULEB128 | Secondary entry point offset | +| `SecEntryPoint` | Delta, ULEB128 | Secondary entry point offset with `LPENTRY` LSB bit | + +`LPENTRY` bit denotes whether a given offset is a landing pad block. If not set, +the offset is a secondary entry point. diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h index 65b9ba874368f3..62367ca3aebdce 100644 --- a/bolt/include/bolt/Profile/BoltAddressTranslation.h +++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h @@ -181,6 +181,9 @@ class BoltAddressTranslation { /// translation map entry const static uint32_t BRANCHENTRY = 0x1; + /// Identifies a landing pad in secondary entry point map entry. + const static uint32_t LPENTRY = 0x1; + public: /// Map basic block input offset to a basic block index and hash pair. class BBHashMapTy { diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp index ec7e303c0f52e8..9ce62052653e36 100644 --- a/bolt/lib/Profile/BoltAddressTranslation.cpp +++ b/bolt/lib/Profile/BoltAddressTranslation.cpp @@ -86,21 +86,16 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) { if (Function.isIgnored() || (!BC.HasRelocations && !Function.isSimple())) continue; -uint32_t NumSecondaryEntryPoints = 0; -Function.forEachEntryPoint([&](uint64_t Offset, const MCSymbol *) { - if (!Offset) -return true; - ++NumSecondaryEntryPoints; - SecondaryEntryPointsMap[OutputAddress].push_back(Offset); - return true; -}); - LLVM_DEBUG(dbgs() << "Function name: " << Function.getPrintName() << "\n"); LLVM_DEBUG(dbgs() << " Address reference: 0x" << Twine::utohexstr(Function.getOutputAddress()) << "\n"); LLVM_DEBUG(dbgs() << formatv(" Hash: {0:x}\n", getBFHash(InputAddress))); -LLVM_DEBUG(dbgs() << " Secondary Entry Points: " << NumSecondaryEntryPoints - << '\n'); +LLVM_DEBUG({ + uint32_t NumSecondaryEntryPoints = 0; + if (SecondaryEntryPointsMap.count(InputAddress)) +NumSecondaryEntryPoints = SecondaryEntryPointsMap[InputAddress].size(); + dbgs() << " Secondary Entry Points: " << NumSecondaryEntryPoints << '\n'; +}); MapTy Map; for (const BinaryBasicBlock *const BB : @@ -207,10 +202,9 @@ void BoltAddressTranslation::writeMaps(std::map &Maps, << Twine::utohexstr(Address) << ".\n"); encodeULEB128(Address - PrevAddress, OS); PrevAddress = Address; -const uint32_t NumSecondaryEntryPoints = -SecondaryEntryPointsMap.count(Address) -? SecondaryEntryPointsMap[Address].size() -: 0; +u
[llvm-branch-commits] [BOLT] Encode landing pads in BAT (PR #114602)
https://github.com/aaupov created https://github.com/llvm/llvm-project/pull/114602 Reuse secondary entry points vector and include landing pad offsets. Use LSB to encode LPENTRY bit, similar to BRANCHENTRY bit used to distinguish branch and block entries in the address map. Test Plan: updated bolt-address-translation.test ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (PR #114438)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114438 >From fe1979082eea45d70ac6b6112f2eb4c4fdb2fa72 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Thu, 31 Oct 2024 12:49:07 -0400 Subject: [PATCH] [WIP][AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute --- llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | 79 +++ .../annotate-kernel-features-hsa-call.ll | 46 +-- .../AMDGPU/attributor-loop-issue-58639.ll | 3 +- .../CodeGen/AMDGPU/direct-indirect-call.ll| 3 +- .../CodeGen/AMDGPU/propagate-waves-per-eu.ll | 59 +++--- .../AMDGPU/remove-no-kernel-id-attribute.ll | 9 ++- .../AMDGPU/uniform-work-group-multistep.ll| 3 +- .../uniform-work-group-recursion-test.ll | 2 +- 8 files changed, 111 insertions(+), 93 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp index 8b111cf15575a6..ba2b6159c4f0a2 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp @@ -198,6 +198,17 @@ class AMDGPUInformationCache : public InformationCache { return ST.getWavesPerEU(F, FlatWorkGroupSize); } + std::optional> + getWavesPerEUAttr(const Function &F) { +auto Val = AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", + /*OnlyFirstRequired=*/true); +if (Val && Val->second == 0) { + const GCNSubtarget &ST = TM.getSubtarget(F); + Val->second = ST.getMaxWavesPerEU(); +} +return Val; + } + std::pair getEffectiveWavesPerEU(const Function &F, std::pair WavesPerEU, @@ -768,22 +779,6 @@ struct AAAMDSizeRangeAttribute /*ForceReplace=*/true); } - ChangeStatus emitAttributeIfNotDefault(Attributor &A, unsigned Min, - unsigned Max) { -// Don't add the attribute if it's the implied default. -if (getAssumed().getLower() == Min && getAssumed().getUpper() - 1 == Max) - return ChangeStatus::UNCHANGED; - -Function *F = getAssociatedFunction(); -LLVMContext &Ctx = F->getContext(); -SmallString<10> Buffer; -raw_svector_ostream OS(Buffer); -OS << getAssumed().getLower() << ',' << getAssumed().getUpper() - 1; -return A.manifestAttrs(getIRPosition(), - {Attribute::get(Ctx, AttrName, OS.str())}, - /*ForceReplace=*/true); - } - const std::string getAsStr(Attributor *) const override { std::string Str; raw_string_ostream OS(Str); @@ -879,29 +874,47 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute { AAAMDWavesPerEU(const IRPosition &IRP, Attributor &A) : AAAMDSizeRangeAttribute(IRP, A, "amdgpu-waves-per-eu") {} - bool isValidState() const override { -return !Assumed.isEmptySet() && IntegerRangeState::isValidState(); - } - void initialize(Attributor &A) override { Function *F = getAssociatedFunction(); auto &InfoCache = static_cast(A.getInfoCache()); -if (const auto *AssumedGroupSize = A.getAAFor( -*this, IRPosition::function(*F), DepClassTy::REQUIRED); -AssumedGroupSize->isValidState()) { +auto TakeRange = [&](std::pair R) { + auto [Min, Max] = R; + ConstantRange Range(APInt(32, Min), APInt(32, Max + 1)); + IntegerRangeState RangeState(Range); + clampStateAndIndicateChange(this->getState(), RangeState); + indicateOptimisticFixpoint(); +}; - unsigned Min, Max; - std::tie(Min, Max) = InfoCache.getWavesPerEU( - *F, {AssumedGroupSize->getAssumed().getLower().getZExtValue(), - AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1}); +std::pair MaxWavesPerEURange{ +1U, InfoCache.getMaxWavesPerEU(*F)}; - ConstantRange Range(APInt(32, Min), APInt(32, Max + 1)); - intersectKnown(Range); +// If the attribute exists, we will honor it if it is not the default. +if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) { + if (*Attr != MaxWavesPerEURange) { +TakeRange(*Attr); +return; + } } -if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) - indicatePessimisticFixpoint(); +// Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the +// calculation of waves per EU involves flat work group size, we can't +// simply use an assumed flat work group size as a start point, because the +// update of flat work group size is in an inverse direction of waves per +// EU. However, we can still do something if it is an entry function. Since +// an entry function is a terminal node, and flat work group size either +// from attribute or default will be used anyway, we can take that value and +// calculate the waves per EU based on it. This result can't be updated by +// no means, but that could still allow us
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114577 >From 488643ca48229d9c48d9b28916fd887b8be15205 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Fri, 1 Nov 2024 12:39:52 -0400 Subject: [PATCH] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline --- clang/lib/CodeGen/BackendUtil.cpp | 22 + llvm/include/llvm/Passes/PassBuilder.h| 20 +++- llvm/lib/Passes/PassBuilderPipelines.cpp | 24 +++ .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 16 - .../CodeGen/AMDGPU/print-pipeline-passes.ll | 1 + llvm/tools/opt/NewPMDriver.cpp| 4 ++-- 6 files changed, 53 insertions(+), 34 deletions(-) diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 47a30f00612eb7..70035a5e069a90 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, // Ensure we lower KCFI operand bundles with -O0. PB.registerOptimizerLastEPCallback( - [&](ModulePassManager &MPM, OptimizationLevel Level) { + [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) { if (Level == OptimizationLevel::O0 && LangOpts.Sanitize.has(SanitizerKind::KCFI)) MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass())); @@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, static void addSanitizers(const Triple &TargetTriple, const CodeGenOptions &CodeGenOpts, const LangOptions &LangOpts, PassBuilder &PB) { - auto SanitizersCallback = [&](ModulePassManager &MPM, -OptimizationLevel Level) { + auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel Level, +ThinOrFullLTOPhase) { if (CodeGenOpts.hasSanitizeCoverage()) { auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts); MPM.addPass(SanitizerCoveragePass( @@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple, }; if (ClSanitizeOnOptimizerEarlyEP) { PB.registerOptimizerEarlyEPCallback( -[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) { +[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase Phase) { ModulePassManager NewMPM; - SanitizersCallback(NewMPM, Level); + SanitizersCallback(NewMPM, Level, Phase); if (!NewMPM.isEmpty()) { // Sanitizers can abandon. NewMPM.addPass(RequireAnalysisPass()); @@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // TODO: Consider passing the MemoryProfileOutput to the pass builder via // the PGOOptions, and set this up there. if (!CodeGenOpts.MemoryProfileOutput.empty()) { - PB.registerOptimizerLastEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { -MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); -MPM.addPass(ModuleMemProfilerPass()); - }); + PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM, +OptimizationLevel Level, +ThinOrFullLTOPhase) { +MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); +MPM.addPass(ModuleMemProfilerPass()); + }); } if (CodeGenOpts.FatLTO) { diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 565fd2ab2147e5..e7bc3a58f414f1 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -490,7 +490,8 @@ class PassBuilder { /// This extension point allows adding optimizations before the function /// optimization pipeline. void registerOptimizerEarlyEPCallback( - const std::function &C) { + const std::function &C) { OptimizerEarlyEPCallbacks.push_back(C); } @@ -499,7 +500,8 @@ class PassBuilder { /// This extension point allows adding optimizations at the very end of the /// function optimization pipeline. void registerOptimizerLastEPCallback( - const std::function &C) { + const std::function &C) { OptimizerLastEPCallbacks.push_back(C); } @@ -630,9 +632,11 @@ class PassBuilder { void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM, OptimizationLevel Level); void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM, - OptimizationLevel Level); + OptimizationLevel Level, + ThinOrFullLTOPhase Phase); void invokeOpt
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
https://github.com/ilovepi commented: I think this is basically good from my perspective, but I'd like one of the longtime LLD maintainers, and maybe someone more experienced w/ PAC to chime in before landing. Maybe @smithp35, @MaskRay, or @kbeyls have some thoughts? https://github.com/llvm/llvm-project/pull/113817 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
https://github.com/ilovepi edited https://github.com/llvm/llvm-project/pull/113817 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
@@ -0,0 +1,134 @@ +// REQUIRES: aarch64 +// RUN: rm -rf %t && split-file %s %t && cd %t + +//--- a.s + +.section .tbss,"awT",@nobits +.global a +a: +.xword 0 + +//--- ok.s + +// RUN: llvm-mc -filetype=obj -triple=aarch64-pc-linux -mattr=+pauth ok.s -o ok.o +// RUN: ld.lld -shared ok.o -o ok.so +// RUN: llvm-objdump --no-print-imm-hex -d --no-show-raw-insn ok.so | \ +// RUN: FileCheck -DP=20 -DA=896 -DB=912 -DC=928 %s +// RUN: llvm-readobj -r -x .got ok.so | FileCheck --check-prefix=REL \ +// RUN: -DP1=20 -DA1=380 -DB1=390 -DC1=3A0 -DP2=020 -DA2=380 -DB2=390 -DC2=3a0 %s + +// RUN: llvm-mc -filetype=obj -triple=aarch64-pc-linux -mattr=+pauth a.s -o a.so.o +// RUN: ld.lld -shared a.so.o -soname=so -o a.so +// RUN: ld.lld ok.o a.so -o ok +// RUN: llvm-objdump --no-print-imm-hex -d --no-show-raw-insn ok | \ +// RUN: FileCheck -DP=220 -DA=936 -DB=952 -DC=968 %s +// RUN: llvm-readobj -r -x .got ok | FileCheck --check-prefix=REL \ +// RUN: -DP1=220 -DA1=3A8 -DB1=3B8 -DC1=3C8 -DP2=220 -DA2=3a8 -DB2=3b8 -DC2=3c8 %s + +.text +adrpx0, :tlsdesc_auth:a +ldr x16, [x0, :tlsdesc_auth_lo12:a] +add x0, x0, :tlsdesc_auth_lo12:a +.tlsdesccall a +blraa x16, x0 + +// CHECK: adrpx0, 0x[[P]]000 +// CHECK-NEXT: ldr x16, [x0, #[[A]]] +// CHECK-NEXT: add x0, x0, #[[A]] +// CHECK-NEXT: blraa x16, x0 + +// Create relocation against local TLS symbols where linker should +// create target specific dynamic TLSDESC relocation where addend is +// the symbol VMA in tls block. + +adrpx0, :tlsdesc_auth:local1 +ldr x16, [x0, :tlsdesc_auth_lo12:local1] +add x0, x0, :tlsdesc_auth_lo12:local1 +.tlsdesccall local1 +blraa x16, x0 + +// CHECK: adrpx0, 0x[[P]]000 +// CHECK-NEXT: ldr x16, [x0, #[[B]]] +// CHECK-NEXT: add x0, x0, #[[B]] +// CHECK-NEXT: blraa x16, x0 + +adrpx0, :tlsdesc_auth:local2 +ldr x16, [x0, :tlsdesc_auth_lo12:local2] +add x0, x0, :tlsdesc_auth_lo12:local2 +.tlsdesccall local2 +blraa x16, x0 + +// CHECK: adrpx0, 0x[[P]]000 +// CHECK-NEXT: ldr x16, [x0, #[[C]]] +// CHECK-NEXT: add x0, x0, #[[C]] +// CHECK-NEXT: blraa x16, x0 + +.section .tbss,"awT",@nobits +.type local1,@object +.p2align 2 +local1: +.word 0 +.size local1, 4 + +.type local2,@object +.p2align 3 +local2: +.xword 0 +.size local2, 8 + + +// R_AARCH64_AUTH_TLSDESC - 0x0 -> start of tls block +// R_AARCH64_AUTH_TLSDESC - 0x8 -> align (sizeof (local1), 8) + +// REL: Relocations [ +// REL-NEXT: Section (5) .rela.dyn { +// REL-NEXT: 0x[[P1]][[B1]] R_AARCH64_AUTH_TLSDESC - 0x0 +// REL-NEXT: 0x[[P1]][[C1]] R_AARCH64_AUTH_TLSDESC - 0x8 +// REL-NEXT: 0x[[P1]][[A1]] R_AARCH64_AUTH_TLSDESC a 0x0 +// REL-NEXT: } +// REL-NEXT: ] + +// REL: Hex dump of section '.got': +// REL-NEXT: 0x00[[P2]][[A2]] 0080 00a0 +// REL-NEXT: 0x00[[P2]][[B2]] 0080 00a0 +// REL-NEXT: 0x00[[P2]][[C2]] 0080 00a0 +// ^^ +// 0b1000 bit 63 address diversity = true, bits 61..60 key = IA +// ^^ +// 0b1010 bit 63 address diversity = true, bits 61..60 key = DA ilovepi wrote: Should these be checked? https://github.com/llvm/llvm-project/pull/113817 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
@@ -1355,6 +1355,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr expr, RelType type, return 1; } + auto fatalBothAuthAndNonAuth = [&sym]() { +fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() + + "' requested, but only one type of TLSDESC entry per symbol is " + "supported"); + }; + + // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE. + // https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions + // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC). + if (oneof( + expr)) { +assert(ctx.arg.emachine == EM_AARCH64); +if (!sym.hasFlag(NEEDS_TLSDESC)) + sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH); +else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH)) + fatalBothAuthAndNonAuth(); +sec->addReloc({expr, type, offset, addend, &sym}); +return 1; + } + + if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) { +assert(ctx.arg.emachine == EM_AARCH64); +// TLSDESC_CALL hint relocation probably should not be emitted by compiler +// with signed TLSDESC enabled since it does not give any value, but leave a +// check against that just in case someone uses it. +if (expr != R_TLSDESC_CALL) + fatalBothAuthAndNonAuth(); ilovepi wrote: Thanks for the clarification. I was thinking something could reach here w/ `NEEDS_TLSDESC_AUTH` set that isn't a `TLSDESC_CALL` reloc, but wouldn't be both Auth and non-Auth (e.g. just plain invalid, rather than this flavor of invalid). https://github.com/llvm/llvm-project/pull/113817 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
@@ -78,6 +78,79 @@ _start: adrp x1, :got_auth:zed add x1, x1, :got_auth_lo12:zed +#--- ok-tiny.s + +# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux ok-tiny.s -o ok-tiny.o + +# RUN: ld.lld ok-tiny.o a.so -pie -o external-tiny +# RUN: llvm-readelf -r -S -x .got external-tiny | FileCheck %s --check-prefix=EXTERNAL-TINY + +# RUN: ld.lld ok-tiny.o a.o -pie -o local-tiny +# RUN: llvm-readelf -r -S -x .got -s local-tiny | FileCheck %s --check-prefix=LOCAL-TINY + +# EXTERNAL-TINY: OffsetInfo Type Symbol's Value Symbol's Name + Addend +# EXTERNAL-TINY-NEXT: 00020380 0001e201 R_AARCH64_AUTH_GLOB_DAT bar + 0 +# EXTERNAL-TINY-NEXT: 00020388 0002e201 R_AARCH64_AUTH_GLOB_DAT zed + 0 + +## Symbol's values for bar and zed are equal since they contain no content (see Inputs/shared.s) +# LOCAL-TINY: OffsetInfo Type Symbol's Value Symbol's Name + Addend +# LOCAL-TINY-NEXT:00020320 0411 R_AARCH64_AUTH_RELATIVE 10260 +# LOCAL-TINY-NEXT:00020328 0411 R_AARCH64_AUTH_RELATIVE 10260 + +# EXTERNAL-TINY: Hex dump of section '.got': +# EXTERNAL-TINY-NEXT: 0x00020380 0080 00a0 +# ^^ +# 0b1000 bit 63 address diversity = true, bits 61..60 key = IA +# ^^ +# 0b1010 bit 63 address diversity = true, bits 61..60 key = DA ilovepi wrote: A thanks for the explanation. I think I made a similar comment in one of the other PRs before I saw this. Feel free to ignore that/mark it resolved. https://github.com/llvm/llvm-project/pull/113816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
https://github.com/ilovepi commented: Again, LGTM, but lets get another maintainer to take a look before landing. https://github.com/llvm/llvm-project/pull/113816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
ilovepi wrote: > Again, LGTM, but lets get another maintainer to take a look before landing. well, assuming presubmit is working. I see a number of test failures, ATM. https://github.com/llvm/llvm-project/pull/113816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
@@ -821,8 +825,15 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { PM.addPass(AMDGPUSwLowerLDSPass(*this)); if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); -if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); +if (Level != OptimizationLevel::O0) { + if (EnableAMDGPUAttributor) +PM.addPass(AMDGPUAttributorPass(*this)); + // Do we really need internalization in LTO? + if (InternalizeSymbols) { shiltian wrote: This needs to be moved before attributor. https://github.com/llvm/llvm-project/pull/114547 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)
https://github.com/wangpc-pp edited https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114547 >From 912283a403e1a3a95ebead98467cc743024b5455 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Fri, 1 Nov 2024 10:51:20 -0400 Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call backs The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues. --- clang/lib/CodeGen/BackendUtil.cpp | 3 ++- llvm/include/llvm/Passes/PassBuilder.h| 10 +++--- llvm/lib/Passes/PassBuilderPipelines.cpp | 8 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++ llvm/lib/Target/BPF/BPFTargetMachine.cpp | 2 +- .../CodeGen/AMDGPU/print-pipeline-passes.ll | 8 llvm/tools/opt/NewPMDriver.cpp| 2 +- 7 files changed, 38 insertions(+), 14 deletions(-) diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index ae33554a66b6b5..47a30f00612eb7 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline( createModuleToFunctionPassAdaptor(ObjCARCExpandPass())); }); PB.registerPipelineEarlySimplificationEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { + [](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase) { if (Level != OptimizationLevel::O0) MPM.addPass(ObjCARCAPElimPass()); }); diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 0ebfdbb7865fdd..565fd2ab2147e5 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -480,7 +480,8 @@ class PassBuilder { /// This extension point allows adding optimization right after passes that do /// basic simplification of the input IR. void registerPipelineEarlySimplificationEPCallback( - const std::function &C) { + const std::function &C) { PipelineEarlySimplificationEPCallbacks.push_back(C); } @@ -639,7 +640,8 @@ class PassBuilder { void invokePipelineStartEPCallbacks(ModulePassManager &MPM, OptimizationLevel Level); void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM, -OptimizationLevel Level); +OptimizationLevel Level, +ThinOrFullLTOPhase Phase); static bool checkParametrizedPassName(StringRef Name, StringRef PassName) { if (!Name.consume_front(PassName)) @@ -764,7 +766,9 @@ class PassBuilder { FullLinkTimeOptimizationLastEPCallbacks; SmallVector, 2> PipelineStartEPCallbacks; - SmallVector, 2> + SmallVector, + 2> PipelineEarlySimplificationEPCallbacks; SmallVector, 2> diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 466fbcd7bb7703..9c90accd9c376b 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -384,9 +384,9 @@ void PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM, C(MPM, Level); } void PassBuilder::invokePipelineEarlySimplificationEPCallbacks( -ModulePassManager &MPM, OptimizationLevel Level) { +ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) { for (auto &C : PipelineEarlySimplificationEPCallbacks) -C(MPM, Level); +C(MPM, Level, Phase); } // Helper to add AnnotationRemarksPass. @@ -1140,7 +1140,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level, MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, lowertypetests::DropTestKind::Assume)); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Interprocedural constant propagation now that basic cleanup has occurred // and prior to optimizing globals. @@ -2153,7 +2153,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level, if (PGOOpt && PGOOpt->DebugInfoForProfiling) MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass())); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Build a minimal pipeline based on the semantics required by LLVM, // which is just that always inlining occurs. Further, disable generating diff --
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)
llvmbot wrote: @llvm/pr-subscribers-clang-codegen Author: Shilei Tian (shiltian) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/114577.diff 6 Files Affected: - (modified) clang/lib/CodeGen/BackendUtil.cpp (+12-10) - (modified) llvm/include/llvm/Passes/PassBuilder.h (+14-6) - (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+14-10) - (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+10-6) - (modified) llvm/test/CodeGen/AMDGPU/print-pipeline-passes.ll (+1) - (modified) llvm/tools/opt/NewPMDriver.cpp (+2-2) ``diff diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 47a30f00612eb7..70035a5e069a90 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, // Ensure we lower KCFI operand bundles with -O0. PB.registerOptimizerLastEPCallback( - [&](ModulePassManager &MPM, OptimizationLevel Level) { + [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) { if (Level == OptimizationLevel::O0 && LangOpts.Sanitize.has(SanitizerKind::KCFI)) MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass())); @@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, static void addSanitizers(const Triple &TargetTriple, const CodeGenOptions &CodeGenOpts, const LangOptions &LangOpts, PassBuilder &PB) { - auto SanitizersCallback = [&](ModulePassManager &MPM, -OptimizationLevel Level) { + auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel Level, +ThinOrFullLTOPhase) { if (CodeGenOpts.hasSanitizeCoverage()) { auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts); MPM.addPass(SanitizerCoveragePass( @@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple, }; if (ClSanitizeOnOptimizerEarlyEP) { PB.registerOptimizerEarlyEPCallback( -[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) { +[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase Phase) { ModulePassManager NewMPM; - SanitizersCallback(NewMPM, Level); + SanitizersCallback(NewMPM, Level, Phase); if (!NewMPM.isEmpty()) { // Sanitizers can abandon. NewMPM.addPass(RequireAnalysisPass()); @@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // TODO: Consider passing the MemoryProfileOutput to the pass builder via // the PGOOptions, and set this up there. if (!CodeGenOpts.MemoryProfileOutput.empty()) { - PB.registerOptimizerLastEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { -MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); -MPM.addPass(ModuleMemProfilerPass()); - }); + PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM, +OptimizationLevel Level, +ThinOrFullLTOPhase) { +MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); +MPM.addPass(ModuleMemProfilerPass()); + }); } if (CodeGenOpts.FatLTO) { diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 565fd2ab2147e5..e7bc3a58f414f1 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -490,7 +490,8 @@ class PassBuilder { /// This extension point allows adding optimizations before the function /// optimization pipeline. void registerOptimizerEarlyEPCallback( - const std::function &C) { + const std::function &C) { OptimizerEarlyEPCallbacks.push_back(C); } @@ -499,7 +500,8 @@ class PassBuilder { /// This extension point allows adding optimizations at the very end of the /// function optimization pipeline. void registerOptimizerLastEPCallback( - const std::function &C) { + const std::function &C) { OptimizerLastEPCallbacks.push_back(C); } @@ -630,9 +632,11 @@ class PassBuilder { void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM, OptimizationLevel Level); void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM, - OptimizationLevel Level); + OptimizationLevel Level, + ThinOrFullLTOPhase Phase); void invokeOptimizerLastEPCallbacks(ModulePassManager &MPM, - OptimizationLevel Level); + Optimizatio
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)
https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/114577 None >From dc94afc308989a4dbaee911f93f1cc1855bd7c55 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Fri, 1 Nov 2024 12:39:52 -0400 Subject: [PATCH] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline --- clang/lib/CodeGen/BackendUtil.cpp | 22 + llvm/include/llvm/Passes/PassBuilder.h| 20 +++- llvm/lib/Passes/PassBuilderPipelines.cpp | 24 +++ .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 16 - .../CodeGen/AMDGPU/print-pipeline-passes.ll | 1 + llvm/tools/opt/NewPMDriver.cpp| 4 ++-- 6 files changed, 53 insertions(+), 34 deletions(-) diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index 47a30f00612eb7..70035a5e069a90 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -674,7 +674,7 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, // Ensure we lower KCFI operand bundles with -O0. PB.registerOptimizerLastEPCallback( - [&](ModulePassManager &MPM, OptimizationLevel Level) { + [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) { if (Level == OptimizationLevel::O0 && LangOpts.Sanitize.has(SanitizerKind::KCFI)) MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass())); @@ -693,8 +693,8 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts, static void addSanitizers(const Triple &TargetTriple, const CodeGenOptions &CodeGenOpts, const LangOptions &LangOpts, PassBuilder &PB) { - auto SanitizersCallback = [&](ModulePassManager &MPM, -OptimizationLevel Level) { + auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel Level, +ThinOrFullLTOPhase) { if (CodeGenOpts.hasSanitizeCoverage()) { auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts); MPM.addPass(SanitizerCoveragePass( @@ -778,9 +778,10 @@ static void addSanitizers(const Triple &TargetTriple, }; if (ClSanitizeOnOptimizerEarlyEP) { PB.registerOptimizerEarlyEPCallback( -[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) { +[SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase Phase) { ModulePassManager NewMPM; - SanitizersCallback(NewMPM, Level); + SanitizersCallback(NewMPM, Level, Phase); if (!NewMPM.isEmpty()) { // Sanitizers can abandon. NewMPM.addPass(RequireAnalysisPass()); @@ -1058,11 +1059,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline( // TODO: Consider passing the MemoryProfileOutput to the pass builder via // the PGOOptions, and set this up there. if (!CodeGenOpts.MemoryProfileOutput.empty()) { - PB.registerOptimizerLastEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { -MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); -MPM.addPass(ModuleMemProfilerPass()); - }); + PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM, +OptimizationLevel Level, +ThinOrFullLTOPhase) { +MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass())); +MPM.addPass(ModuleMemProfilerPass()); + }); } if (CodeGenOpts.FatLTO) { diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 565fd2ab2147e5..e7bc3a58f414f1 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -490,7 +490,8 @@ class PassBuilder { /// This extension point allows adding optimizations before the function /// optimization pipeline. void registerOptimizerEarlyEPCallback( - const std::function &C) { + const std::function &C) { OptimizerEarlyEPCallbacks.push_back(C); } @@ -499,7 +500,8 @@ class PassBuilder { /// This extension point allows adding optimizations at the very end of the /// function optimization pipeline. void registerOptimizerLastEPCallback( - const std::function &C) { + const std::function &C) { OptimizerLastEPCallbacks.push_back(C); } @@ -630,9 +632,11 @@ class PassBuilder { void invokeVectorizerStartEPCallbacks(FunctionPassManager &FPM, OptimizationLevel Level); void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM, - OptimizationLevel Level); + OptimizationLevel Level, + ThinOrFullLTOPhase Phase); void inv
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to optimizer pipeline (PR #114577)
shiltian wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/114577?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#114577** https://app.graphite.dev/github/pr/llvm/llvm-project/114577?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#114547** https://app.graphite.dev/github/pr/llvm/llvm-project/114547?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#114564** https://app.graphite.dev/github/pr/llvm/llvm-project/114564?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @shiltian and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/114577 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/114547 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114547 >From e753c4fadf85f1730a458804bec41d32df5a692b Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Fri, 1 Nov 2024 10:51:20 -0400 Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call backs The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues. --- clang/lib/CodeGen/BackendUtil.cpp | 3 ++- llvm/include/llvm/Passes/PassBuilder.h| 10 +++--- llvm/lib/Passes/PassBuilderPipelines.cpp | 8 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++ llvm/lib/Target/BPF/BPFTargetMachine.cpp | 2 +- .../CodeGen/AMDGPU/print-pipeline-passes.ll | 8 llvm/tools/opt/NewPMDriver.cpp| 2 +- 7 files changed, 38 insertions(+), 14 deletions(-) diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index ae33554a66b6b5..47a30f00612eb7 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline( createModuleToFunctionPassAdaptor(ObjCARCExpandPass())); }); PB.registerPipelineEarlySimplificationEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { + [](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase) { if (Level != OptimizationLevel::O0) MPM.addPass(ObjCARCAPElimPass()); }); diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 0ebfdbb7865fdd..565fd2ab2147e5 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -480,7 +480,8 @@ class PassBuilder { /// This extension point allows adding optimization right after passes that do /// basic simplification of the input IR. void registerPipelineEarlySimplificationEPCallback( - const std::function &C) { + const std::function &C) { PipelineEarlySimplificationEPCallbacks.push_back(C); } @@ -639,7 +640,8 @@ class PassBuilder { void invokePipelineStartEPCallbacks(ModulePassManager &MPM, OptimizationLevel Level); void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM, -OptimizationLevel Level); +OptimizationLevel Level, +ThinOrFullLTOPhase Phase); static bool checkParametrizedPassName(StringRef Name, StringRef PassName) { if (!Name.consume_front(PassName)) @@ -764,7 +766,9 @@ class PassBuilder { FullLinkTimeOptimizationLastEPCallbacks; SmallVector, 2> PipelineStartEPCallbacks; - SmallVector, 2> + SmallVector, + 2> PipelineEarlySimplificationEPCallbacks; SmallVector, 2> diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 7c512ab15a6d38..bfb9678678f18a 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -384,9 +384,9 @@ void PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM, C(MPM, Level); } void PassBuilder::invokePipelineEarlySimplificationEPCallbacks( -ModulePassManager &MPM, OptimizationLevel Level) { +ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) { for (auto &C : PipelineEarlySimplificationEPCallbacks) -C(MPM, Level); +C(MPM, Level, Phase); } // Helper to add AnnotationRemarksPass. @@ -1140,7 +1140,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level, MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, lowertypetests::DropTestKind::Assume)); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Interprocedural constant propagation now that basic cleanup has occurred // and prior to optimizing globals. @@ -2155,7 +2155,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level, if (PGOOpt && PGOOpt->DebugInfoForProfiling) MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass())); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Build a minimal pipeline based on the semantics required by LLVM, // which is just that always inlining occurs. Further, disable generating diff --
[llvm-branch-commits] [flang] [flang][cuda] Data transfer with descriptor (PR #114302)
https://github.com/clementval closed https://github.com/llvm/llvm-project/pull/114302 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] 704c0b8 - Revert "[flang][runtime][NFC] Allow different memmove function in assign (#11…"
Author: Valentin Clement (バレンタイン クレメン) Date: 2024-11-01T10:39:56-07:00 New Revision: 704c0b8e429443150ef4b58fc654ef6087f90e03 URL: https://github.com/llvm/llvm-project/commit/704c0b8e429443150ef4b58fc654ef6087f90e03 DIFF: https://github.com/llvm/llvm-project/commit/704c0b8e429443150ef4b58fc654ef6087f90e03.diff LOG: Revert "[flang][runtime][NFC] Allow different memmove function in assign (#11…" This reverts commit b278fe3297557c8db492e2d90b4ea9fe683fa479. Added: Modified: flang/include/flang/Runtime/assign.h flang/runtime/assign.cpp Removed: diff --git a/flang/include/flang/Runtime/assign.h b/flang/include/flang/Runtime/assign.h index 331ec0516dd2d5..a1cc9eaf4355f6 100644 --- a/flang/include/flang/Runtime/assign.h +++ b/flang/include/flang/Runtime/assign.h @@ -24,35 +24,11 @@ #define FORTRAN_RUNTIME_ASSIGN_H_ #include "flang/Runtime/entry-names.h" -#include "flang/Runtime/freestanding-tools.h" namespace Fortran::runtime { class Descriptor; -class Terminator; - -enum AssignFlags { - NoAssignFlags = 0, - MaybeReallocate = 1 << 0, - NeedFinalization = 1 << 1, - CanBeDefinedAssignment = 1 << 2, - ComponentCanBeDefinedAssignment = 1 << 3, - ExplicitLengthCharacterLHS = 1 << 4, - PolymorphicLHS = 1 << 5, - DeallocateLHS = 1 << 6 -}; - -using MemmoveFct = void *(*)(void *, const void *, std::size_t); - -static RT_API_ATTRS void *MemmoveWrapper( -void *dest, const void *src, std::size_t count) { - return Fortran::runtime::memmove(dest, src, count); -} - -RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from, -Terminator &terminator, int flags, MemmoveFct memmoveFct = &MemmoveWrapper); extern "C" { - // API for lowering assignment void RTDECL(Assign)(Descriptor &to, const Descriptor &from, const char *sourceFile = nullptr, int sourceLine = 0); diff --git a/flang/runtime/assign.cpp b/flang/runtime/assign.cpp index 8f31fc4d127168..d558ada51cd21a 100644 --- a/flang/runtime/assign.cpp +++ b/flang/runtime/assign.cpp @@ -17,6 +17,17 @@ namespace Fortran::runtime { +enum AssignFlags { + NoAssignFlags = 0, + MaybeReallocate = 1 << 0, + NeedFinalization = 1 << 1, + CanBeDefinedAssignment = 1 << 2, + ComponentCanBeDefinedAssignment = 1 << 3, + ExplicitLengthCharacterLHS = 1 << 4, + PolymorphicLHS = 1 << 5, + DeallocateLHS = 1 << 6 +}; + // Predicate: is the left-hand side of an assignment an allocated allocatable // that must be deallocated? static inline RT_API_ATTRS bool MustDeallocateLHS( @@ -239,8 +250,8 @@ static RT_API_ATTRS void BlankPadCharacterAssignment(Descriptor &to, // of elements, but their shape need not to conform (the assignment is done in // element sequence order). This facilitates some internal usages, like when // dealing with array constructors. -RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from, -Terminator &terminator, int flags, MemmoveFct memmoveFct) { +RT_API_ATTRS static void Assign( +Descriptor &to, const Descriptor &from, Terminator &terminator, int flags) { bool mustDeallocateLHS{(flags & DeallocateLHS) || MustDeallocateLHS(to, from, terminator, flags)}; DescriptorAddendum *toAddendum{to.Addendum()}; @@ -412,14 +423,14 @@ RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from, Assign(toCompDesc, fromCompDesc, terminator, nestedFlags); } else { // Component has intrinsic type; simply copy raw bytes std::size_t componentByteSize{comp.SizeInBytes(to)}; -memmoveFct(to.Element(toAt) + comp.offset(), +Fortran::runtime::memmove(to.Element(toAt) + comp.offset(), from.Element(fromAt) + comp.offset(), componentByteSize); } break; case typeInfo::Component::Genre::Pointer: { std::size_t componentByteSize{comp.SizeInBytes(to)}; - memmoveFct(to.Element(toAt) + comp.offset(), + Fortran::runtime::memmove(to.Element(toAt) + comp.offset(), from.Element(fromAt) + comp.offset(), componentByteSize); } break; @@ -465,14 +476,14 @@ RT_API_ATTRS void Assign(Descriptor &to, const Descriptor &from, const auto &procPtr{ *procPtrDesc.ZeroBasedIndexedElement( k)}; -memmoveFct(to.Element(toAt) + procPtr.offset, +Fortran::runtime::memmove(to.Element(toAt) + procPtr.offset, from.Element(fromAt) + procPtr.offset, sizeof(typeInfo::ProcedurePointer)); } } } else { // intrinsic type, intrinsic assignment if (isSimpleMemmove()) { - memmoveFct(to.raw().base_addr, from.raw().base_addr, + Fortran::runtime::memmove(to.raw().base_addr, from.raw().base_addr, toElements * toElementBytes); } else if (toElementBytes > fromElementBytes) { // blank padding switch (to.type().raw()) { @@ -496,8 +
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114547 >From 8ae74a4c6a96eb0c44668d571aa61116eaa48cbe Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Fri, 1 Nov 2024 10:51:20 -0400 Subject: [PATCH] [PassBuilder] Add `LTOPreLink` to early simplication EP call backs The early simplication pipeline is used in non-LTO and (Thin/Full)LTO pre-link stage. There are some passes that we want them in non-LTO mode, but not at LTO pre-link stage. The control is missing currently. This PR adds the support. To demonstrate the use, we only enable the internalization pass in non-LTO mode for AMDGPU because having it run in pre-link stage causes some issues. --- clang/lib/CodeGen/BackendUtil.cpp | 3 ++- llvm/include/llvm/Passes/PassBuilder.h| 12 llvm/lib/Passes/PassBuilderPipelines.cpp | 8 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 19 +++ llvm/lib/Target/BPF/BPFTargetMachine.cpp | 2 +- .../CodeGen/AMDGPU/print-pipeline-passes.ll | 8 llvm/tools/opt/NewPMDriver.cpp| 2 +- 7 files changed, 39 insertions(+), 15 deletions(-) diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp index ae33554a66b6b5..47a30f00612eb7 100644 --- a/clang/lib/CodeGen/BackendUtil.cpp +++ b/clang/lib/CodeGen/BackendUtil.cpp @@ -993,7 +993,8 @@ void EmitAssemblyHelper::RunOptimizationPipeline( createModuleToFunctionPassAdaptor(ObjCARCExpandPass())); }); PB.registerPipelineEarlySimplificationEPCallback( - [](ModulePassManager &MPM, OptimizationLevel Level) { + [](ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase) { if (Level != OptimizationLevel::O0) MPM.addPass(ObjCARCAPElimPass()); }); diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h index 0ebfdbb7865fdd..268df03615db23 100644 --- a/llvm/include/llvm/Passes/PassBuilder.h +++ b/llvm/include/llvm/Passes/PassBuilder.h @@ -480,7 +480,8 @@ class PassBuilder { /// This extension point allows adding optimization right after passes that do /// basic simplification of the input IR. void registerPipelineEarlySimplificationEPCallback( - const std::function &C) { + const std::function &C) { PipelineEarlySimplificationEPCallbacks.push_back(C); } @@ -638,8 +639,9 @@ class PassBuilder { OptimizationLevel Level); void invokePipelineStartEPCallbacks(ModulePassManager &MPM, OptimizationLevel Level); - void invokePipelineEarlySimplificationEPCallbacks(ModulePassManager &MPM, -OptimizationLevel Level); + void invokePipelineEarlySimplificationEPCallbacks( + ModulePassManager &MPM, OptimizationLevel Level, + ThinOrFullLTOPhase Phase = ThinOrFullLTOPhase::None); static bool checkParametrizedPassName(StringRef Name, StringRef PassName) { if (!Name.consume_front(PassName)) @@ -764,7 +766,9 @@ class PassBuilder { FullLinkTimeOptimizationLastEPCallbacks; SmallVector, 2> PipelineStartEPCallbacks; - SmallVector, 2> + SmallVector, + 2> PipelineEarlySimplificationEPCallbacks; SmallVector, 2> diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 7c512ab15a6d38..bfb9678678f18a 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -384,9 +384,9 @@ void PassBuilder::invokePipelineStartEPCallbacks(ModulePassManager &MPM, C(MPM, Level); } void PassBuilder::invokePipelineEarlySimplificationEPCallbacks( -ModulePassManager &MPM, OptimizationLevel Level) { +ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase Phase) { for (auto &C : PipelineEarlySimplificationEPCallbacks) -C(MPM, Level); +C(MPM, Level, Phase); } // Helper to add AnnotationRemarksPass. @@ -1140,7 +1140,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level, MPM.addPass(LowerTypeTestsPass(nullptr, nullptr, lowertypetests::DropTestKind::Assume)); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Interprocedural constant propagation now that basic cleanup has occurred // and prior to optimizing globals. @@ -2155,7 +2155,7 @@ PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level, if (PGOOpt && PGOOpt->DebugInfoForProfiling) MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass())); - invokePipelineEarlySimplificationEPCallbacks(MPM, Level); + invokePipelineEarlySimplificationEPCallbacks(MPM, Level, Phase); // Build a minimal pipeline based on the semantics require
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
@@ -821,8 +825,15 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { PM.addPass(AMDGPUSwLowerLDSPass(*this)); if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); -if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); +if (Level != OptimizationLevel::O0) { + if (EnableAMDGPUAttributor) +PM.addPass(AMDGPUAttributorPass(*this)); + // Do we really need internalization in LTO? + if (InternalizeSymbols) { arsenm wrote: Do we need the custom internalize anymore? I thought this was because of mis-set visibility in the libraries, but that was fixed? https://github.com/llvm/llvm-project/pull/114547 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
@@ -821,8 +825,15 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { PM.addPass(AMDGPUSwLowerLDSPass(*this)); if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); -if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); +if (Level != OptimizationLevel::O0) { + if (EnableAMDGPUAttributor) +PM.addPass(AMDGPUAttributorPass(*this)); + // Do we really need internalization in LTO? + if (InternalizeSymbols) { shiltian wrote: Well, we probably still need that for those don't use LTO, such as `comgr`. https://github.com/llvm/llvm-project/pull/114547 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute (PR #114438)
https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/114438 >From 7181479ee055c0c8d15a674d577a9cd694e21621 Mon Sep 17 00:00:00 2001 From: Shilei Tian Date: Thu, 31 Oct 2024 12:49:07 -0400 Subject: [PATCH] [WIP][AMDGPU][Attributor] Make `AAAMDWavesPerEU` honor existing attribute --- llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | 79 +++ .../annotate-kernel-features-hsa-call.ll | 46 +-- .../AMDGPU/attributor-loop-issue-58639.ll | 3 +- .../CodeGen/AMDGPU/direct-indirect-call.ll| 3 +- .../CodeGen/AMDGPU/propagate-waves-per-eu.ll | 59 +++--- .../AMDGPU/remove-no-kernel-id-attribute.ll | 9 ++- .../AMDGPU/uniform-work-group-multistep.ll| 3 +- .../uniform-work-group-recursion-test.ll | 2 +- 8 files changed, 111 insertions(+), 93 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp index 066003395af3f2..18b617d17bec5c 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp @@ -198,6 +198,17 @@ class AMDGPUInformationCache : public InformationCache { return ST.getWavesPerEU(F, FlatWorkGroupSize); } + std::optional> + getWavesPerEUAttr(const Function &F) { +auto Val = AMDGPU::getIntegerPairAttribute(F, "amdgpu-waves-per-eu", + /*OnlyFirstRequired=*/true); +if (Val && Val->second == 0) { + const GCNSubtarget &ST = TM.getSubtarget(F); + Val->second = ST.getMaxWavesPerEU(); +} +return Val; + } + std::pair getEffectiveWavesPerEU(const Function &F, std::pair WavesPerEU, @@ -768,22 +779,6 @@ struct AAAMDSizeRangeAttribute /*ForceReplace=*/true); } - ChangeStatus emitAttributeIfNotDefault(Attributor &A, unsigned Min, - unsigned Max) { -// Don't add the attribute if it's the implied default. -if (getAssumed().getLower() == Min && getAssumed().getUpper() - 1 == Max) - return ChangeStatus::UNCHANGED; - -Function *F = getAssociatedFunction(); -LLVMContext &Ctx = F->getContext(); -SmallString<10> Buffer; -raw_svector_ostream OS(Buffer); -OS << getAssumed().getLower() << ',' << getAssumed().getUpper() - 1; -return A.manifestAttrs(getIRPosition(), - {Attribute::get(Ctx, AttrName, OS.str())}, - /*ForceReplace=*/true); - } - const std::string getAsStr(Attributor *) const override { std::string Str; raw_string_ostream OS(Str); @@ -880,29 +875,47 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute { AAAMDWavesPerEU(const IRPosition &IRP, Attributor &A) : AAAMDSizeRangeAttribute(IRP, A, "amdgpu-waves-per-eu") {} - bool isValidState() const override { -return !Assumed.isEmptySet() && IntegerRangeState::isValidState(); - } - void initialize(Attributor &A) override { Function *F = getAssociatedFunction(); auto &InfoCache = static_cast(A.getInfoCache()); -if (const auto *AssumedGroupSize = A.getAAFor( -*this, IRPosition::function(*F), DepClassTy::REQUIRED); -AssumedGroupSize->isValidState()) { +auto TakeRange = [&](std::pair R) { + auto [Min, Max] = R; + ConstantRange Range(APInt(32, Min), APInt(32, Max + 1)); + IntegerRangeState RangeState(Range); + clampStateAndIndicateChange(this->getState(), RangeState); + indicateOptimisticFixpoint(); +}; - unsigned Min, Max; - std::tie(Min, Max) = InfoCache.getWavesPerEU( - *F, {AssumedGroupSize->getAssumed().getLower().getZExtValue(), - AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1}); +std::pair MaxWavesPerEURange{ +1U, InfoCache.getMaxWavesPerEU(*F)}; - ConstantRange Range(APInt(32, Min), APInt(32, Max + 1)); - intersectKnown(Range); +// If the attribute exists, we will honor it if it is not the default. +if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) { + if (*Attr != MaxWavesPerEURange) { +TakeRange(*Attr); +return; + } } -if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) - indicatePessimisticFixpoint(); +// Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the +// calculation of waves per EU involves flat work group size, we can't +// simply use an assumed flat work group size as a start point, because the +// update of flat work group size is in an inverse direction of waves per +// EU. However, we can still do something if it is an entry function. Since +// an entry function is a terminal node, and flat work group size either +// from attribute or default will be used anyway, we can take that value and +// calculate the waves per EU based on it. This result can't be updated by +// no means, but that could still allow us
[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/114640 Backport 6ca816f88d5f0f2032d1610207023133eaf40a1e Requested by: @owenca >From 628477ce78cf2460ef3ec075494dcbbb67f8f7c8 Mon Sep 17 00:00:00 2001 From: Owen Pan Date: Fri, 1 Nov 2024 18:47:50 -0700 Subject: [PATCH] [clang-format] Fix a regression in parsing `switch` in macro call (#114506) Fixes #114408. (cherry picked from commit 6ca816f88d5f0f2032d1610207023133eaf40a1e) --- clang/lib/Format/UnwrappedLineParser.cpp | 8 ++-- clang/unittests/Format/TokenAnnotatorTest.cpp | 7 +++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp index a5268e153bcc5b..bfb592ae074938 100644 --- a/clang/lib/Format/UnwrappedLineParser.cpp +++ b/clang/lib/Format/UnwrappedLineParser.cpp @@ -2086,7 +2086,8 @@ void UnwrappedLineParser::parseStructuralElement( case tok::kw_switch: if (Style.Language == FormatStyle::LK_Java) parseSwitch(/*IsExpr=*/true); - nextToken(); + else +nextToken(); break; case tok::kw_case: // Proto: there are no switch/case statements. @@ -2637,7 +2638,10 @@ bool UnwrappedLineParser::parseParens(TokenType AmpAmpTokenType) { nextToken(); break; case tok::kw_switch: - parseSwitch(/*IsExpr=*/true); + if (Style.Language == FormatStyle::LK_Java) +parseSwitch(/*IsExpr=*/true); + else +nextToken(); break; case tok::kw_requires: { auto RequiresToken = FormatTok; diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp index 4acd900ff061f8..07999116ab0cf0 100644 --- a/clang/unittests/Format/TokenAnnotatorTest.cpp +++ b/clang/unittests/Format/TokenAnnotatorTest.cpp @@ -3412,6 +3412,13 @@ TEST_F(TokenAnnotatorTest, TemplateInstantiation) { EXPECT_TOKEN(Tokens[18], tok::greater, TT_TemplateCloser); } +TEST_F(TokenAnnotatorTest, SwitchInMacroArgument) { + auto Tokens = annotate("FOOBAR(switch);\n" + "void f() {}"); + ASSERT_EQ(Tokens.size(), 12u) << Tokens; + EXPECT_TOKEN(Tokens[9], tok::l_brace, TT_FunctionLBrace); +} + } // namespace } // namespace format } // namespace clang ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)
llvmbot wrote: @HazardyKnusperkeks What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/114640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/114640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang-format] Fix a regression in parsing `switch` in macro call (#114506) (PR #114640)
llvmbot wrote: @llvm/pr-subscribers-clang-format Author: None (llvmbot) Changes Backport 6ca816f88d5f0f2032d1610207023133eaf40a1e Requested by: @owenca --- Full diff: https://github.com/llvm/llvm-project/pull/114640.diff 2 Files Affected: - (modified) clang/lib/Format/UnwrappedLineParser.cpp (+6-2) - (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+7) ``diff diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp index a5268e153bcc5b..bfb592ae074938 100644 --- a/clang/lib/Format/UnwrappedLineParser.cpp +++ b/clang/lib/Format/UnwrappedLineParser.cpp @@ -2086,7 +2086,8 @@ void UnwrappedLineParser::parseStructuralElement( case tok::kw_switch: if (Style.Language == FormatStyle::LK_Java) parseSwitch(/*IsExpr=*/true); - nextToken(); + else +nextToken(); break; case tok::kw_case: // Proto: there are no switch/case statements. @@ -2637,7 +2638,10 @@ bool UnwrappedLineParser::parseParens(TokenType AmpAmpTokenType) { nextToken(); break; case tok::kw_switch: - parseSwitch(/*IsExpr=*/true); + if (Style.Language == FormatStyle::LK_Java) +parseSwitch(/*IsExpr=*/true); + else +nextToken(); break; case tok::kw_requires: { auto RequiresToken = FormatTok; diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp index 4acd900ff061f8..07999116ab0cf0 100644 --- a/clang/unittests/Format/TokenAnnotatorTest.cpp +++ b/clang/unittests/Format/TokenAnnotatorTest.cpp @@ -3412,6 +3412,13 @@ TEST_F(TokenAnnotatorTest, TemplateInstantiation) { EXPECT_TOKEN(Tokens[18], tok::greater, TT_TemplateCloser); } +TEST_F(TokenAnnotatorTest, SwitchInMacroArgument) { + auto Tokens = annotate("FOOBAR(switch);\n" + "void f() {}"); + ASSERT_EQ(Tokens.size(), 12u) << Tokens; + EXPECT_TOKEN(Tokens[9], tok::l_brace, TT_FunctionLBrace); +} + } // namespace } // namespace format } // namespace clang `` https://github.com/llvm/llvm-project/pull/114640 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [PassBuilder] Add `ThinOrFullLTOPhase` to early simplication EP call backs (PR #114547)
@@ -821,8 +825,15 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) { PM.addPass(AMDGPUSwLowerLDSPass(*this)); if (EnableLowerModuleLDS) PM.addPass(AMDGPULowerModuleLDSPass(*this)); -if (EnableAMDGPUAttributor && Level != OptimizationLevel::O0) - PM.addPass(AMDGPUAttributorPass(*this)); +if (Level != OptimizationLevel::O0) { + if (EnableAMDGPUAttributor) +PM.addPass(AMDGPUAttributorPass(*this)); + // Do we really need internalization in LTO? + if (InternalizeSymbols) { shiltian wrote: I'm not sure. I can start a PSDB session to test it out whether it makes difference after removing it. https://github.com/llvm/llvm-project/pull/114547 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
@@ -1144,42 +2872,116 @@ entry: define i32 @memcmp_size_4(ptr %s1, ptr %s2) nounwind { ; CHECK-ALIGNED-RV32-LABEL: memcmp_size_4: ; CHECK-ALIGNED-RV32: # %bb.0: # %entry -; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, -16 -; CHECK-ALIGNED-RV32-NEXT:sw ra, 12(sp) # 4-byte Folded Spill -; CHECK-ALIGNED-RV32-NEXT:li a2, 4 -; CHECK-ALIGNED-RV32-NEXT:call memcmp -; CHECK-ALIGNED-RV32-NEXT:lw ra, 12(sp) # 4-byte Folded Reload -; CHECK-ALIGNED-RV32-NEXT:addi sp, sp, 16 +; CHECK-ALIGNED-RV32-NEXT:lbu a2, 0(a0) +; CHECK-ALIGNED-RV32-NEXT:lbu a3, 1(a0) +; CHECK-ALIGNED-RV32-NEXT:lbu a4, 3(a0) +; CHECK-ALIGNED-RV32-NEXT:lbu a0, 2(a0) +; CHECK-ALIGNED-RV32-NEXT:lbu a5, 0(a1) +; CHECK-ALIGNED-RV32-NEXT:lbu a6, 1(a1) +; CHECK-ALIGNED-RV32-NEXT:lbu a7, 3(a1) +; CHECK-ALIGNED-RV32-NEXT:lbu a1, 2(a1) +; CHECK-ALIGNED-RV32-NEXT:slli a0, a0, 8 +; CHECK-ALIGNED-RV32-NEXT:or a0, a0, a4 +; CHECK-ALIGNED-RV32-NEXT:slli a3, a3, 16 +; CHECK-ALIGNED-RV32-NEXT:slli a2, a2, 24 +; CHECK-ALIGNED-RV32-NEXT:or a2, a2, a3 +; CHECK-ALIGNED-RV32-NEXT:or a0, a2, a0 +; CHECK-ALIGNED-RV32-NEXT:slli a1, a1, 8 +; CHECK-ALIGNED-RV32-NEXT:or a1, a1, a7 +; CHECK-ALIGNED-RV32-NEXT:slli a6, a6, 16 +; CHECK-ALIGNED-RV32-NEXT:slli a5, a5, 24 +; CHECK-ALIGNED-RV32-NEXT:or a2, a5, a6 +; CHECK-ALIGNED-RV32-NEXT:or a1, a2, a1 +; CHECK-ALIGNED-RV32-NEXT:sltu a2, a1, a0 +; CHECK-ALIGNED-RV32-NEXT:sltu a0, a0, a1 +; CHECK-ALIGNED-RV32-NEXT:sub a0, a2, a0 ; CHECK-ALIGNED-RV32-NEXT:ret ; ; CHECK-ALIGNED-RV64-LABEL: memcmp_size_4: ; CHECK-ALIGNED-RV64: # %bb.0: # %entry -; CHECK-ALIGNED-RV64-NEXT:addi sp, sp, -16 -; CHECK-ALIGNED-RV64-NEXT:sd ra, 8(sp) # 8-byte Folded Spill -; CHECK-ALIGNED-RV64-NEXT:li a2, 4 -; CHECK-ALIGNED-RV64-NEXT:call memcmp -; CHECK-ALIGNED-RV64-NEXT:ld ra, 8(sp) # 8-byte Folded Reload -; CHECK-ALIGNED-RV64-NEXT:addi sp, sp, 16 +; CHECK-ALIGNED-RV64-NEXT:lbu a2, 0(a0) +; CHECK-ALIGNED-RV64-NEXT:lbu a3, 1(a0) +; CHECK-ALIGNED-RV64-NEXT:lbu a4, 2(a0) +; CHECK-ALIGNED-RV64-NEXT:lb a0, 3(a0) +; CHECK-ALIGNED-RV64-NEXT:lbu a5, 0(a1) +; CHECK-ALIGNED-RV64-NEXT:lbu a6, 1(a1) +; CHECK-ALIGNED-RV64-NEXT:lbu a7, 2(a1) +; CHECK-ALIGNED-RV64-NEXT:lb a1, 3(a1) +; CHECK-ALIGNED-RV64-NEXT:andi a0, a0, 255 +; CHECK-ALIGNED-RV64-NEXT:slli a4, a4, 8 +; CHECK-ALIGNED-RV64-NEXT:or a0, a4, a0 +; CHECK-ALIGNED-RV64-NEXT:slli a3, a3, 16 +; CHECK-ALIGNED-RV64-NEXT:slliw a2, a2, 24 +; CHECK-ALIGNED-RV64-NEXT:or a2, a2, a3 +; CHECK-ALIGNED-RV64-NEXT:or a0, a2, a0 +; CHECK-ALIGNED-RV64-NEXT:andi a1, a1, 255 +; CHECK-ALIGNED-RV64-NEXT:slli a7, a7, 8 +; CHECK-ALIGNED-RV64-NEXT:or a1, a7, a1 +; CHECK-ALIGNED-RV64-NEXT:slli a6, a6, 16 +; CHECK-ALIGNED-RV64-NEXT:slliw a2, a5, 24 +; CHECK-ALIGNED-RV64-NEXT:or a2, a2, a6 +; CHECK-ALIGNED-RV64-NEXT:or a1, a2, a1 +; CHECK-ALIGNED-RV64-NEXT:sltu a2, a1, a0 +; CHECK-ALIGNED-RV64-NEXT:sltu a0, a0, a1 +; CHECK-ALIGNED-RV64-NEXT:sub a0, a2, a0 ; CHECK-ALIGNED-RV64-NEXT:ret ; ; CHECK-UNALIGNED-RV32-LABEL: memcmp_size_4: ; CHECK-UNALIGNED-RV32: # %bb.0: # %entry -; CHECK-UNALIGNED-RV32-NEXT:addi sp, sp, -16 -; CHECK-UNALIGNED-RV32-NEXT:sw ra, 12(sp) # 4-byte Folded Spill -; CHECK-UNALIGNED-RV32-NEXT:li a2, 4 -; CHECK-UNALIGNED-RV32-NEXT:call memcmp -; CHECK-UNALIGNED-RV32-NEXT:lw ra, 12(sp) # 4-byte Folded Reload -; CHECK-UNALIGNED-RV32-NEXT:addi sp, sp, 16 +; CHECK-UNALIGNED-RV32-NEXT:lw a0, 0(a0) wangpc-pp wrote: Here is the code of memcmp copied from glibc: https://godbolt.org/z/4KxPTE6q1 There are many cases (which means branches) in this general implementation, at least we can benefit from unrolling and removal of branches. https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [PAC][CodeGen][ELF][AArch64] Support signed GOT with tiny code model (PR #113812)
https://github.com/kovdan01 updated https://github.com/llvm/llvm-project/pull/113812 >From c2ffa88c7b9f8e7a6b12cef59c83b288382c402b Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Sun, 27 Oct 2024 17:23:17 +0300 Subject: [PATCH] [PAC][CodeGen][ELF][AArch64] Support signed GOT with tiny code model Support the following relocations and assembly operators: - `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` (`:got_auth:` for `adr`) - `R_AARCH64_AUTH_GOT_LD_PREL19` (`:got_auth:` for `ldr`) `LOADgotAUTH` pseudo-instruction is expanded to actual instruction sequence like the following. ``` adr x16, :got_auth:sym ldr x0, [x16] autia x0, x16 ``` Both SelectionDAG and GlobalISel are suppported. For FastISel, we fall back to SelectionDAG. Tests starting with 'ptrauth-' have corresponding variants w/o this prefix. --- llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp | 48 +++-- .../AArch64/AsmParser/AArch64AsmParser.cpp| 8 +- .../MCTargetDesc/AArch64ELFObjectWriter.cpp | 18 ++ .../CodeGen/AArch64/ptrauth-extern-weak.ll| 42 .../CodeGen/AArch64/ptrauth-tiny-model-pic.ll | 182 ++ .../AArch64/ptrauth-tiny-model-static.ll | 157 +++ llvm/test/MC/AArch64/arm64-elf-relocs.s | 13 ++ llvm/test/MC/AArch64/ilp32-diagnostics.s | 6 + 8 files changed, 455 insertions(+), 19 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/ptrauth-tiny-model-pic.ll create mode 100644 llvm/test/CodeGen/AArch64/ptrauth-tiny-model-static.ll diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp index e79457f925db66..c2a7450ffb9132 100644 --- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp +++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp @@ -2277,28 +2277,40 @@ void AArch64AsmPrinter::LowerLOADgotAUTH(const MachineInstr &MI) { const MachineOperand &GAMO = MI.getOperand(1); assert(GAMO.getOffset() == 0); - MachineOperand GAHiOp(GAMO); - MachineOperand GALoOp(GAMO); - GAHiOp.addTargetFlag(AArch64II::MO_PAGE); - GALoOp.addTargetFlag(AArch64II::MO_PAGEOFF | AArch64II::MO_NC); + if (MI.getParent()->getParent()->getTarget().getCodeModel() == + CodeModel::Tiny) { +MCOperand GAMC; +MCInstLowering.lowerOperand(GAMO, GAMC); +EmitToStreamer( +MCInstBuilder(AArch64::ADR).addReg(AArch64::X17).addOperand(GAMC)); +EmitToStreamer(MCInstBuilder(AArch64::LDRXui) + .addReg(AuthResultReg) + .addReg(AArch64::X17) + .addImm(0)); + } else { +MachineOperand GAHiOp(GAMO); +MachineOperand GALoOp(GAMO); +GAHiOp.addTargetFlag(AArch64II::MO_PAGE); +GALoOp.addTargetFlag(AArch64II::MO_PAGEOFF | AArch64II::MO_NC); - MCOperand GAMCHi, GAMCLo; - MCInstLowering.lowerOperand(GAHiOp, GAMCHi); - MCInstLowering.lowerOperand(GALoOp, GAMCLo); +MCOperand GAMCHi, GAMCLo; +MCInstLowering.lowerOperand(GAHiOp, GAMCHi); +MCInstLowering.lowerOperand(GALoOp, GAMCLo); - EmitToStreamer( - MCInstBuilder(AArch64::ADRP).addReg(AArch64::X17).addOperand(GAMCHi)); +EmitToStreamer( +MCInstBuilder(AArch64::ADRP).addReg(AArch64::X17).addOperand(GAMCHi)); - EmitToStreamer(MCInstBuilder(AArch64::ADDXri) - .addReg(AArch64::X17) - .addReg(AArch64::X17) - .addOperand(GAMCLo) - .addImm(0)); +EmitToStreamer(MCInstBuilder(AArch64::ADDXri) + .addReg(AArch64::X17) + .addReg(AArch64::X17) + .addOperand(GAMCLo) + .addImm(0)); - EmitToStreamer(MCInstBuilder(AArch64::LDRXui) - .addReg(AuthResultReg) - .addReg(AArch64::X17) - .addImm(0)); +EmitToStreamer(MCInstBuilder(AArch64::LDRXui) + .addReg(AuthResultReg) + .addReg(AArch64::X17) + .addImm(0)); + } assert(GAMO.isGlobal()); MCSymbol *UndefWeakSym; diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp index b83ca3f7e52db4..de8e0a4731e419 100644 --- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp +++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp @@ -3353,7 +3353,13 @@ ParseStatus AArch64AsmParser::tryParseAdrLabel(OperandVector &Operands) { // No modifier was specified at all; this is the syntax for an ELF basic // ADR relocation (unfortunately). Expr = AArch64MCExpr::create(Expr, AArch64MCExpr::VK_ABS, getContext()); -} else { +} else if (ELFRefKind != AArch64MCExpr::VK_GOT_AUTH_PAGE) { + // For tiny code model, we use :got_auth: operator to fill 21-bit imm of + // adr. It's not actually GOT entry page address but the GOT address + // itself - we just share the same variant kind with :got_auth: operator + // applied for adrp. +
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
@@ -1355,6 +1355,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr expr, RelType type, return 1; } + auto fatalBothAuthAndNonAuth = [&sym]() { +fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() + + "' requested, but only one type of TLSDESC entry per symbol is " + "supported"); + }; + + // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE. + // https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions + // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC). + if (oneof( + expr)) { +assert(ctx.arg.emachine == EM_AARCH64); +if (!sym.hasFlag(NEEDS_TLSDESC)) + sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH); +else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH)) + fatalBothAuthAndNonAuth(); +sec->addReloc({expr, type, offset, addend, &sym}); +return 1; + } + + if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) { +assert(ctx.arg.emachine == EM_AARCH64); +// TLSDESC_CALL hint relocation probably should not be emitted by compiler +// with signed TLSDESC enabled since it does not give any value, but leave a +// check against that just in case someone uses it. +if (expr != R_TLSDESC_CALL) + fatalBothAuthAndNonAuth(); kovdan01 wrote: The logic of this code and code above is the following. We check rel expr against AUTH variants. 1. If yes, the symbol should either already have both `NEEDS_TLSDESC` and `NEEDS_TLSDESC_AUTH` or have none of them. The symbol having only `NEEDS_TLSDESC` means that a non-auth entry was requested previously, and now we are requesting an auth one - which is currently not supported. 2. If no, but `NEEDS_TLSDESC_AUTH` flag was already set previously, it means that auth entry was requested previously, and non-auth one is requested now, which is currently not supported. The only case when we don't emit an error is expr equal to `R_TLSDESC_CALL` - this is a hint reloc which does not result in a non-auth entry by itself. See comment right above this if statement. https://github.com/llvm/llvm-project/pull/113817 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] ValueTracking: Allow getUnderlyingObject to look at vectors (PR #114311)
https://github.com/nikic approved this pull request. A tentative LGTM. I *think* this particular change is fine, but it's a dangerous area because all of AA basically does not support pointers of vectors at all and treats them as escapes. Wouldn't surprise me if this causes a miscompile. https://github.com/llvm/llvm-project/pull/114311 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
@@ -78,6 +78,79 @@ _start: adrp x1, :got_auth:zed add x1, x1, :got_auth_lo12:zed +#--- ok-tiny.s + +# RUN: llvm-mc -filetype=obj -triple=aarch64-none-linux ok-tiny.s -o ok-tiny.o + +# RUN: ld.lld ok-tiny.o a.so -pie -o external-tiny +# RUN: llvm-readelf -r -S -x .got external-tiny | FileCheck %s --check-prefix=EXTERNAL-TINY + +# RUN: ld.lld ok-tiny.o a.o -pie -o local-tiny +# RUN: llvm-readelf -r -S -x .got -s local-tiny | FileCheck %s --check-prefix=LOCAL-TINY + +# EXTERNAL-TINY: OffsetInfo Type Symbol's Value Symbol's Name + Addend +# EXTERNAL-TINY-NEXT: 00020380 0001e201 R_AARCH64_AUTH_GLOB_DAT bar + 0 +# EXTERNAL-TINY-NEXT: 00020388 0002e201 R_AARCH64_AUTH_GLOB_DAT zed + 0 + +## Symbol's values for bar and zed are equal since they contain no content (see Inputs/shared.s) +# LOCAL-TINY: OffsetInfo Type Symbol's Value Symbol's Name + Addend +# LOCAL-TINY-NEXT:00020320 0411 R_AARCH64_AUTH_RELATIVE 10260 +# LOCAL-TINY-NEXT:00020328 0411 R_AARCH64_AUTH_RELATIVE 10260 + +# EXTERNAL-TINY: Hex dump of section '.got': +# EXTERNAL-TINY-NEXT: 0x00020380 0080 00a0 +# ^^ +# 0b1000 bit 63 address diversity = true, bits 61..60 key = IA +# ^^ +# 0b1010 bit 63 address diversity = true, bits 61..60 key = DA kovdan01 wrote: This is not output to be checked and matched but comments helping to understand the contents of hex dump. I've changed the prefix to `##` so it's clear that it's a comment and not a special line like RUN/CHECK/etc. See e841e190df73a5cbc6639cb40c467623f1b953ac https://github.com/llvm/llvm-project/pull/113816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
https://github.com/kovdan01 edited https://github.com/llvm/llvm-project/pull/113816 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed TLSDESC (PR #113817)
https://github.com/kovdan01 updated https://github.com/llvm/llvm-project/pull/113817 >From d89a47e22f427f8fe989ca24c9289821c8bda09d Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Fri, 25 Oct 2024 12:32:27 +0300 Subject: [PATCH 1/2] [PAC][lld][AArch64][ELF] Support signed TLSDESC Support `R_AARCH64_AUTH_TLSDESC_ADR_PAGE21`, `R_AARCH64_AUTH_TLSDESC_LD64_LO12` and `R_AARCH64_AUTH_TLSDESC_LD64_LO12` static TLSDESC relocations. --- lld/ELF/Arch/AArch64.cpp | 8 ++ lld/ELF/InputSection.cpp | 2 + lld/ELF/Relocations.cpp | 38 +++- lld/ELF/Relocations.h| 4 + lld/ELF/Symbols.h| 1 + lld/ELF/SyntheticSections.cpp| 5 + lld/test/ELF/aarch64-tlsdesc-pauth.s | 134 +++ 7 files changed, 190 insertions(+), 2 deletions(-) create mode 100644 lld/test/ELF/aarch64-tlsdesc-pauth.s diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp index 86f509f3fd78a7..8ad466bf49878b 100644 --- a/lld/ELF/Arch/AArch64.cpp +++ b/lld/ELF/Arch/AArch64.cpp @@ -157,9 +157,14 @@ RelExpr AArch64::getRelExpr(RelType type, const Symbol &s, return R_AARCH64_AUTH; case R_AARCH64_TLSDESC_ADR_PAGE21: return R_AARCH64_TLSDESC_PAGE; + case R_AARCH64_AUTH_TLSDESC_ADR_PAGE21: +return R_AARCH64_AUTH_TLSDESC_PAGE; case R_AARCH64_TLSDESC_LD64_LO12: case R_AARCH64_TLSDESC_ADD_LO12: return R_TLSDESC; + case R_AARCH64_AUTH_TLSDESC_LD64_LO12: + case R_AARCH64_AUTH_TLSDESC_ADD_LO12: +return RelExpr::R_AARCH64_AUTH_TLSDESC; case R_AARCH64_TLSDESC_CALL: return R_TLSDESC_CALL; case R_AARCH64_TLSLE_ADD_TPREL_HI12: @@ -543,6 +548,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel, case R_AARCH64_ADR_PREL_PG_HI21: case R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21: case R_AARCH64_TLSDESC_ADR_PAGE21: + case R_AARCH64_AUTH_TLSDESC_ADR_PAGE21: checkInt(ctx, loc, val, 33, rel); [[fallthrough]]; case R_AARCH64_ADR_PREL_PG_HI21_NC: @@ -593,6 +599,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel, case R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC: case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC: case R_AARCH64_TLSDESC_LD64_LO12: + case R_AARCH64_AUTH_TLSDESC_LD64_LO12: checkAlignment(ctx, loc, val, 8, rel); write32Imm12(loc, getBits(val, 3, 11)); break; @@ -667,6 +674,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel, break; case R_AARCH64_TLSLE_ADD_TPREL_LO12_NC: case R_AARCH64_TLSDESC_ADD_LO12: + case R_AARCH64_AUTH_TLSDESC_ADD_LO12: write32Imm12(loc, val); break; case R_AARCH64_TLSDESC: diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp index ccc7cf8c6e2de9..b3303c59a3b4a5 100644 --- a/lld/ELF/InputSection.cpp +++ b/lld/ELF/InputSection.cpp @@ -935,12 +935,14 @@ uint64_t InputSectionBase::getRelocTargetVA(Ctx &ctx, const Relocation &r, case R_SIZE: return r.sym->getSize() + a; case R_TLSDESC: + case RelExpr::R_AARCH64_AUTH_TLSDESC: return ctx.in.got->getTlsDescAddr(*r.sym) + a; case R_TLSDESC_PC: return ctx.in.got->getTlsDescAddr(*r.sym) + a - p; case R_TLSDESC_GOTPLT: return ctx.in.got->getTlsDescAddr(*r.sym) + a - ctx.in.gotPlt->getVA(); case R_AARCH64_TLSDESC_PAGE: + case R_AARCH64_AUTH_TLSDESC_PAGE: return getAArch64Page(ctx.in.got->getTlsDescAddr(*r.sym) + a) - getAArch64Page(p); case R_LOONGARCH_TLSDESC_PAGE_PC: diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp index dbe0bcfcdc34f6..f53406cbf63566 100644 --- a/lld/ELF/Relocations.cpp +++ b/lld/ELF/Relocations.cpp @@ -1352,6 +1352,36 @@ unsigned RelocationScanner::handleTlsRelocation(RelExpr expr, RelType type, return 1; } + auto fatalBothAuthAndNonAuth = [&sym]() { +fatal("both AUTH and non-AUTH TLSDESC entries for '" + sym.getName() + + "' requested, but only one type of TLSDESC entry per symbol is " + "supported"); + }; + + // Do not optimize signed TLSDESC as described in pauthabielf64 to LE/IE. + // https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#general-restrictions + // > PAUTHELF64 only supports the descriptor based TLS (TLSDESC). + if (oneof( + expr)) { +assert(ctx.arg.emachine == EM_AARCH64); +if (!sym.hasFlag(NEEDS_TLSDESC)) + sym.setFlags(NEEDS_TLSDESC | NEEDS_TLSDESC_AUTH); +else if (!sym.hasFlag(NEEDS_TLSDESC_AUTH)) + fatalBothAuthAndNonAuth(); +sec->addReloc({expr, type, offset, addend, &sym}); +return 1; + } + + if (sym.hasFlag(NEEDS_TLSDESC_AUTH)) { +assert(ctx.arg.emachine == EM_AARCH64); +// TLSDESC_CALL hint relocation probably should not be emitted by compiler +// with signed TLSDESC enabled since it does not give any value, but leave a +// check against that just in case someone uses it. +if (expr != R_TLSDESC_CALL) + fatalBothAuthAndNonAuth(); +return 1; + } + bool isRISCV = ctx.arg.emachine
[llvm-branch-commits] [lld] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model (PR #113816)
https://github.com/kovdan01 updated https://github.com/llvm/llvm-project/pull/113816 >From 4b1795d57490dbcef1cf7ce17739a0d6023e5cca Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Fri, 25 Oct 2024 21:28:18 +0300 Subject: [PATCH 1/2] [PAC][lld][AArch64][ELF] Support signed GOT with tiny code model Support `R_AARCH64_AUTH_GOT_ADR_PREL_LO21` and `R_AARCH64_AUTH_GOT_LD_PREL19` GOT-generating relocations. --- lld/ELF/Arch/AArch64.cpp | 5 ++ lld/ELF/InputSection.cpp | 1 + lld/ELF/Relocations.cpp | 17 ++--- lld/ELF/Relocations.h| 1 + lld/test/ELF/aarch64-got-relocations-pauth.s | 73 5 files changed, 89 insertions(+), 8 deletions(-) diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp index 86f509f3fd78a7..2f2e0c2a52b0ef 100644 --- a/lld/ELF/Arch/AArch64.cpp +++ b/lld/ELF/Arch/AArch64.cpp @@ -205,6 +205,9 @@ RelExpr AArch64::getRelExpr(RelType type, const Symbol &s, case R_AARCH64_AUTH_LD64_GOT_LO12_NC: case R_AARCH64_AUTH_GOT_ADD_LO12_NC: return R_AARCH64_AUTH_GOT; + case R_AARCH64_AUTH_GOT_LD_PREL19: + case R_AARCH64_AUTH_GOT_ADR_PREL_LO21: +return R_AARCH64_AUTH_GOT_PC; case R_AARCH64_LD64_GOTPAGE_LO15: return R_AARCH64_GOT_PAGE; case R_AARCH64_ADR_GOT_PAGE: @@ -549,6 +552,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel, write32AArch64Addr(loc, val >> 12); break; case R_AARCH64_ADR_PREL_LO21: + case R_AARCH64_AUTH_GOT_ADR_PREL_LO21: checkInt(ctx, loc, val, 21, rel); write32AArch64Addr(loc, val); break; @@ -569,6 +573,7 @@ void AArch64::relocate(uint8_t *loc, const Relocation &rel, case R_AARCH64_CONDBR19: case R_AARCH64_LD_PREL_LO19: case R_AARCH64_GOT_LD_PREL19: + case R_AARCH64_AUTH_GOT_LD_PREL19: checkAlignment(ctx, loc, val, 4, rel); checkInt(ctx, loc, val, 21, rel); writeMaskedBits32le(loc, (val & 0x1C) << 3, 0x1C << 3); diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp index ccc7cf8c6e2de9..ba135afd3580bf 100644 --- a/lld/ELF/InputSection.cpp +++ b/lld/ELF/InputSection.cpp @@ -788,6 +788,7 @@ uint64_t InputSectionBase::getRelocTargetVA(Ctx &ctx, const Relocation &r, case R_AARCH64_GOT_PAGE: return r.sym->getGotVA(ctx) + a - getAArch64Page(ctx.in.got->getVA()); case R_GOT_PC: + case R_AARCH64_AUTH_GOT_PC: case R_RELAX_TLS_GD_TO_IE: return r.sym->getGotVA(ctx) + a - p; case R_GOTPLT_GOTREL: diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp index dbe0bcfcdc34f6..2e679834add158 100644 --- a/lld/ELF/Relocations.cpp +++ b/lld/ELF/Relocations.cpp @@ -210,11 +210,11 @@ static bool needsPlt(RelExpr expr) { } bool lld::elf::needsGot(RelExpr expr) { - return oneof( - expr); + return oneof(expr); } // True if this expression is of the form Sym - X, where X is a position in the @@ -1011,8 +1011,8 @@ bool RelocationScanner::isStaticLinkTimeConstant(RelExpr e, RelType type, R_GOTONLY_PC, R_GOTPLTONLY_PC, R_PLT_PC, R_PLT_GOTREL, R_PLT_GOTPLT, R_GOTPLT_GOTREL, R_GOTPLT_PC, R_PPC32_PLTREL, R_PPC64_CALL_PLT, R_PPC64_RELAX_TOC, R_RISCV_ADD, R_AARCH64_GOT_PAGE, -R_AARCH64_AUTH_GOT, R_LOONGARCH_PLT_PAGE_PC, R_LOONGARCH_GOT, -R_LOONGARCH_GOT_PAGE_PC>(e)) +R_AARCH64_AUTH_GOT, R_AARCH64_AUTH_GOT_PC, R_LOONGARCH_PLT_PAGE_PC, +R_LOONGARCH_GOT, R_LOONGARCH_GOT_PAGE_PC>(e)) return true; // These never do, except if the entire file is position dependent or if @@ -1126,7 +1126,8 @@ void RelocationScanner::processAux(RelExpr expr, RelType type, uint64_t offset, // Many LoongArch TLS relocs reuse the R_LOONGARCH_GOT type, in which // case the NEEDS_GOT flag shouldn't get set. bool needsGotAuth = - (expr == R_AARCH64_AUTH_GOT || expr == R_AARCH64_AUTH_GOT_PAGE_PC); + (expr == R_AARCH64_AUTH_GOT || expr == R_AARCH64_AUTH_GOT_PC || + expr == R_AARCH64_AUTH_GOT_PAGE_PC); uint16_t flags = sym.flags.load(std::memory_order_relaxed); if (!(flags & NEEDS_GOT)) { sym.setFlags(needsGotAuth ? (NEEDS_GOT | NEEDS_GOT_AUTH) : NEEDS_GOT); diff --git a/lld/ELF/Relocations.h b/lld/ELF/Relocations.h index 20d88de402ac18..38d55d46116569 100644 --- a/lld/ELF/Relocations.h +++ b/lld/ELF/Relocations.h @@ -89,6 +89,7 @@ enum RelExpr { R_AARCH64_AUTH_GOT_PAGE_PC, R_AARCH64_GOT_PAGE, R_AARCH64_AUTH_GOT, + R_AARCH64_AUTH_GOT_PC, R_AARCH64_PAGE_PC, R_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC, R_AARCH64_TLSDESC_PAGE, diff --git a/lld/test/ELF/aarch64-got-relocations-pauth.s b/lld/test/ELF/aarch64-got-relocations-pauth.s index ef089b61b6771c..c6cfd0c18b15f9 100644 --- a/lld/test/ELF/aarch64-got-relocations-pauth.s +++ b/lld/test/ELF/aarch64-got-relocations-pauth.s @@ -78,6 +78,79 @@ _start: adrp x1, :got_auth:zed add x1, x1, :got_auth_lo12:zed +#--- ok-tiny.s + +# RUN: ll
[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)
llvmbot wrote: @llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) Changes --- Patch is 404.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/114517.diff 4 Files Affected: - (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+100-3) - (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+5) - (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+920-530) - (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4570-1843) ``diff diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 3b3f8772a08940..89b4f22a1260db 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -23,6 +23,7 @@ #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/MemoryLocation.h" #include "llvm/Analysis/VectorUtils.h" +#include "llvm/CodeGen/ISDOpcodes.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" @@ -14474,17 +14475,116 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType IndexType, SelectionDAG &D return true; } +/// Recursive helper for combineVectorSizedSetCCEquality() to see if we have a +/// recognizable memcmp expansion. +static bool isOrXorXorTree(SDValue X, bool Root = true) { + if (X.getOpcode() == ISD::OR) +return isOrXorXorTree(X.getOperand(0), false) && + isOrXorXorTree(X.getOperand(1), false); + if (Root) +return false; + return X.getOpcode() == ISD::XOR; +} + +/// Recursive helper for combineVectorSizedSetCCEquality() to emit the memcmp +/// expansion. +static SDValue emitOrXorXorTree(SDValue X, const SDLoc &DL, SelectionDAG &DAG, +EVT VecVT, EVT CmpVT) { + SDValue Op0 = X.getOperand(0); + SDValue Op1 = X.getOperand(1); + if (X.getOpcode() == ISD::OR) { +SDValue A = emitOrXorXorTree(Op0, DL, DAG, VecVT, CmpVT); +SDValue B = emitOrXorXorTree(Op1, DL, DAG, VecVT, CmpVT); +if (VecVT != CmpVT) + return DAG.getNode(ISD::OR, DL, CmpVT, A, B); +return DAG.getNode(ISD::AND, DL, CmpVT, A, B); + } + if (X.getOpcode() == ISD::XOR) { +SDValue A = DAG.getBitcast(VecVT, Op0); +SDValue B = DAG.getBitcast(VecVT, Op1); +if (VecVT != CmpVT) + return DAG.getSetCC(DL, CmpVT, A, B, ISD::SETNE); +return DAG.getSetCC(DL, CmpVT, A, B, ISD::SETEQ); + } + llvm_unreachable("Impossible"); +} + +/// Try to map a 128-bit or larger integer comparison to vector instructions +/// before type legalization splits it up into chunks. +static SDValue +combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, +const SDLoc &DL, SelectionDAG &DAG, +const RISCVSubtarget &Subtarget) { + assert((CC == ISD::SETNE || CC == ISD::SETEQ) && "Bad comparison predicate"); + + EVT OpVT = X.getValueType(); + MVT XLenVT = Subtarget.getXLenVT(); + unsigned OpSize = OpVT.getSizeInBits(); + + // We're looking for an oversized integer equality comparison. + if (!Subtarget.hasVInstructions() || !OpVT.isScalarInteger() || + OpSize < Subtarget.getRealMinVLen() || + OpSize > Subtarget.getRealMinVLen() * 8) +return SDValue(); + + bool IsOrXorXorTreeCCZero = isNullConstant(Y) && isOrXorXorTree(X); + if (isNullConstant(Y) && !IsOrXorXorTreeCCZero) +return SDValue(); + + // Don't perform this combine if constructing the vector will be expensive. + auto IsVectorBitCastCheap = [](SDValue X) { +X = peekThroughBitcasts(X); +return isa(X) || X.getValueType().isVector() || + X.getOpcode() == ISD::LOAD; + }; + if ((!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y)) && + !IsOrXorXorTreeCCZero) +return SDValue(); + + bool NoImplicitFloatOps = + DAG.getMachineFunction().getFunction().hasFnAttribute( + Attribute::NoImplicitFloat); + if (!NoImplicitFloatOps && Subtarget.hasVInstructions()) { +unsigned VecSize = OpSize / 8; +EVT VecVT = MVT::getVectorVT(MVT::i8, VecSize); +EVT CmpVT = MVT::getVectorVT(MVT::i1, VecSize); + +SDValue Cmp; +if (IsOrXorXorTreeCCZero) { + Cmp = emitOrXorXorTree(X, DL, DAG, VecVT, CmpVT); +} else { + SDValue VecX = DAG.getBitcast(VecVT, X); + SDValue VecY = DAG.getBitcast(VecVT, Y); + Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, ISD::SETEQ); +} +return DAG.getSetCC(DL, VT, +DAG.getNode(ISD::VECREDUCE_AND, DL, XLenVT, Cmp), +DAG.getConstant(0, DL, XLenVT), CC); + } + + return SDValue(); +} + // Replace (seteq (i64 (and X, 0x)), C1) with // (seteq (i64 (sext_inreg (X, i32)), C1')) where C1' is C1 sign extended from // bit 31. Same for setne. C1' may be cheaper to materialize and the sext_inreg // can become a sext.w instead of a shift pair. static SDValue performSETCCCombine(SDNode *N, Selecti
[llvm-branch-commits] [RISCV] Support memcmp expansion for vectors (PR #114517)
https://github.com/wangpc-pp created https://github.com/llvm/llvm-project/pull/114517 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [PAC][lld] Use braa instr in PAC PLT sequence with valid PAuth core info (PR #113945)
https://github.com/kovdan01 updated https://github.com/llvm/llvm-project/pull/113945 >From f2daf75b8506e31180f2d41291c6f1a63da5138b Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Mon, 28 Oct 2024 21:23:54 +0300 Subject: [PATCH 1/2] [PAC][lld] Use braa instr in PAC PLT sequence with valid PAuth core info Assume PAC instructions being supported with PAuth core info different from (0,0). Given that, `autia1716; br x17` can be replaced with `braa x17, x16; nop`. --- lld/ELF/Arch/AArch64.cpp | 19 +++ lld/test/ELF/aarch64-feature-pauth.s | 10 ++ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp index 260307ac4c3dcb..c76f226bc5511c 100644 --- a/lld/ELF/Arch/AArch64.cpp +++ b/lld/ELF/Arch/AArch64.cpp @@ -999,7 +999,9 @@ class AArch64BtiPac final : public AArch64 { private: bool btiHeader; // bti instruction needed in PLT Header and Entry - bool pacEntry; // autia1716 instruction needed in PLT Entry + bool pacEntry; // Authenticated branch needed in PLT Entry + bool pacUseHint = + true; // Use hint space instructions for authenticated branch in PLT entry }; } // namespace @@ -1016,6 +1018,10 @@ AArch64BtiPac::AArch64BtiPac(Ctx &ctx) : AArch64(ctx) { // from properties in the objects, so we use the command line flag. pacEntry = ctx.arg.zPacPlt; + if (llvm::any_of(ctx.aarch64PauthAbiCoreInfo, + [](uint8_t c) { return c != 0; })) +pacUseHint = false; + if (btiHeader || pacEntry) { pltEntrySize = 24; ipltEntrySize = 24; @@ -1066,9 +1072,13 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol &sym, 0x11, 0x02, 0x40, 0xf9, // ldr x17, [x16, Offset(&(.got.plt[n]))] 0x10, 0x02, 0x00, 0x91 // add x16, x16, Offset(&(.got.plt[n])) }; + const uint8_t pacHintBr[] = { + 0x9f, 0x21, 0x03, 0xd5, // autia1716 + 0x20, 0x02, 0x1f, 0xd6 // br x17 + }; const uint8_t pacBr[] = { - 0x9f, 0x21, 0x03, 0xd5, // autia1716 - 0x20, 0x02, 0x1f, 0xd6 // br x17 + 0x30, 0x0a, 0x1f, 0xd7, // braa x17, x16 + 0x1f, 0x20, 0x03, 0xd5 // nop }; const uint8_t stdBr[] = { 0x20, 0x02, 0x1f, 0xd6, // br x17 @@ -1097,7 +1107,8 @@ void AArch64BtiPac::writePlt(uint8_t *buf, const Symbol &sym, relocateNoSym(buf + 8, R_AARCH64_ADD_ABS_LO12_NC, gotPltEntryAddr); if (pacEntry) -memcpy(buf + sizeof(addrInst), pacBr, sizeof(pacBr)); +memcpy(buf + sizeof(addrInst), (pacUseHint ? pacHintBr : pacBr), + sizeof(pacUseHint ? pacHintBr : pacBr)); else memcpy(buf + sizeof(addrInst), stdBr, sizeof(stdBr)); if (!hasBti) diff --git a/lld/test/ELF/aarch64-feature-pauth.s b/lld/test/ELF/aarch64-feature-pauth.s index c11073dba86f24..34f2f2698a26b8 100644 --- a/lld/test/ELF/aarch64-feature-pauth.s +++ b/lld/test/ELF/aarch64-feature-pauth.s @@ -56,8 +56,8 @@ # PACPLTTAG: 0x7003 (AARCH64_PAC_PLT) -# RUN: llvm-objdump -d pacplt-nowarn | FileCheck --check-prefix PACPLT -DA=10380 -DB=478 -DC=480 %s -# RUN: llvm-objdump -d pacplt-warn | FileCheck --check-prefix PACPLT -DA=10390 -DB=488 -DC=490 %s +# RUN: llvm-objdump -d pacplt-nowarn | FileCheck --check-prefixes=PACPLT,NOHINT -DA=10380 -DB=478 -DC=480 %s +# RUN: llvm-objdump -d pacplt-warn | FileCheck --check-prefixes=PACPLT,HINT -DA=10390 -DB=488 -DC=490 %s # PACPLT: Disassembly of section .text: # PACPLT: : @@ -77,8 +77,10 @@ # PACPLT-NEXT: adrpx16, 0x3 # PACPLT-NEXT: ldr x17, [x16, #0x[[C]]] # PACPLT-NEXT: add x16, x16, #0x[[C]] -# PACPLT-NEXT: autia1716 -# PACPLT-NEXT: br x17 +# NOHINT-NEXT: braax17, x16 +# NOHINT-NEXT: nop +# HINT-NEXT: autia1716 +# HINT-NEXT: br x17 # PACPLT-NEXT: nop #--- abi-tag-short.s >From 026d7ca30ba8a9a0e1c0242c3e2635c0c76e4500 Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Fri, 1 Nov 2024 14:20:44 +0300 Subject: [PATCH 2/2] Address review comments --- lld/ELF/Arch/AArch64.cpp | 30 +++--- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp index c76f226bc5511c..e33971ea5d2499 100644 --- a/lld/ELF/Arch/AArch64.cpp +++ b/lld/ELF/Arch/AArch64.cpp @@ -999,9 +999,11 @@ class AArch64BtiPac final : public AArch64 { private: bool btiHeader; // bti instruction needed in PLT Header and Entry - bool pacEntry; // Authenticated branch needed in PLT Entry - bool pacUseHint = - true; // Use hint space instructions for authenticated branch in PLT entry + enum { +PEK_NoAuth, +PEK_AuthHint, // use autia1716 instr for authenticated branch in PLT entry +PEK_Auth, // use braa instr for authenticated branch in PLT entry + } pacEntryKind; }; } // namespace @@ -1016,13 +1018,18 @@ AArch64BtiPac::AArch64BtiPac(Ctx &ctx) : AArch64(ctx) { // relocations. // The PAC PLT en
[llvm-branch-commits] [lld] [PAC][lld] Use braa instr in PAC PLT sequence with valid PAuth core info (PR #113945)
@@ -999,7 +999,9 @@ class AArch64BtiPac final : public AArch64 { private: bool btiHeader; // bti instruction needed in PLT Header and Entry - bool pacEntry; // autia1716 instruction needed in PLT Entry + bool pacEntry; // Authenticated branch needed in PLT Entry kovdan01 wrote: Changed to an enum, thanks! It might be worth using a switch statement instead of if's and ternary operators at the end of `AArch64BtiPac::writePlt`, but it looks like that it's readable enough right now, so I left that "as is" unless there is a request for changing that as well. https://github.com/llvm/llvm-project/pull/113945 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Define high bits of FPR and GPR registers. (PR #114263)
@@ -424,6 +424,58 @@ AArch64RegisterInfo::explainReservedReg(const MachineFunction &MF, return {}; } +static SmallVector ReservedHi = { sdesmalen-arm wrote: I don't think there is a bug; the code for moving an instruction goes through the list of operands to update the register's liverange. For each physreg it then goes through the regunits to calculate/update the liverange for that regunit, but only if the regunit is not reserved. The code that determines if the register is reserved says: ``` // A register unit is considered reserved if all its roots and all their // super registers are reserved. ``` Without this change to AArch64RegisterInfo.cpp, WZR and XZR are marked as reserved, but WZR_HI isn't (because WZR_HI is a sibling of WZR, and `markSuperRegs` marks only XZR as reserved), and so `IsReserved` is `false` for the WZR_HI regunit. Why this doesn't fail for AMDGPU I don't know, perhaps these registers are always virtual and they never go down this path. https://github.com/llvm/llvm-project/pull/114263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits