[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/159234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/159234 AMDGPU: Ensure both wavesize features are not set Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor. This breaks the assembler test which enables both, but this behavior is not really useful. Maybe it's better to just delete the test. Convert wave_any test to update_mc_test_checks update wave_any test >From 41365e5cc69b3732c8bc8f1d138c3b6984e08e41 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 17 Sep 2025 02:00:48 +0900 Subject: [PATCH 1/3] AMDGPU: Ensure both wavesize features are not set Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor. This breaks the assembler test which enables both, but this behavior is not really useful. Maybe it's better to just delete the test. --- .../MCTargetDesc/AMDGPUMCTargetDesc.cpp | 16 +++-- .../wavesize-feature-unsupported-target.s | 23 +++ .../AMDGPU/gfx1250_wave64_feature.s | 13 +++ .../AMDGPU/gfx9_wave32_feature.txt| 13 +++ 4 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s create mode 100644 llvm/test/MC/Disassembler/AMDGPU/gfx9_wave32_feature.txt diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp @@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) { MCSubtargetInfo *STI = createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64); + bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32); + // FIXME: We should error for the default target. if (STI->getFeatureBits().none()) STI->ToggleFeature(AMDGPU::FeatureSouthernIslands); - if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) && - !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) { + if (!IsWave64 && !IsWave32) { // If there is no default wave size it must be a generation before gfx10, // these have FeatureWavefrontSize64 in their definition already. For gfx10+ // set wave32 as a default. STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI) ? AMDGPU::FeatureWavefrontSize32 : AMDGPU::FeatureWavefrontSize64); + } else if (IsWave64 && IsWave32) { +// The wave size is mutually exclusive. If both somehow end up set, wave64 +// wins. +// +// FIXME: This should really just be an error. +STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32); } + assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^ + STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) && + "wavesize features are mutually exclusive"); + return STI; } diff --git a/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s b/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s new file mode 100644 index 0..8fc7b7fb05f0c --- /dev/null +++ b/llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s @@ -0,0 +1,23 @@ +// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1250 -mattr=+wavefrontsize64 -o - %s | FileCheck -check-prefix=GFX1250 %s +// RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -mattr=+wavefrontsize32 -o - %s | FileCheck -check-prefix=GFX900 %s + +// Both are supported, but not at the same time +// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 %s | FileCheck -check-prefixes=GFX10 %s + +// Test that there is no assertion when using an explicit +// wavefrontsize attribute on a target which does not support it. + +// GFX1250: v_add_f64_e32 v[0:1], 1.0, v[0:1] +// GFX900: v_add_f64 v[0:1], 1.0, v[0:1] +// GFX10: v_add_f64 v[0:1], 1.0, v[0:1] +v_add_f64 v[0:1], 1.0, v[0:1] + +// GFX1250: v_cmp_eq_u32_e64 s[0:1], 1.0, s1 +// GFX900: v_cmp_eq_u32_e64 s[0:1], 1.0, s1 +// GFX10: v_cmp_eq_u32_e64 s[0:1], 1.0, s1 +v_cmp_eq_u32_e64 s[0:1], 1.0, s1 + +// GFX1250: v_cndmask_b32_e64 v1, v2, v3, s[0:1] +// GFX900: v_cndmask_b32_e64 v1, v2, v3, s[0:1] +// GFX10: v_cndmask_b32_e64 v1, v2, v3, s[0:1] +v_cndmask_b32 v1, v2, v3, s[0:1] diff --git a/llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s b/llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s new file mode 100644 index 0..bdea636a9efe3 --- /dev/null +++ b/llvm/
[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Make sure we cannot be in a mode with both wavesizes. This prevents assertions in a future change. This should probably just be an error, but we do not have a good way to report errors from the MCSubtargetInfo constructor. This breaks the assembler test which enables both, but this behavior is not really useful. Maybe it's better to just delete the test. --- Patch is 24.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159234.diff 5 Files Affected: - (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp (+14-2) - (modified) llvm/test/MC/AMDGPU/wave_any.s (+62-60) - (added) llvm/test/MC/AMDGPU/wavesize-feature-unsupported-target.s (+23) - (added) llvm/test/MC/Disassembler/AMDGPU/gfx1250_wave64_feature.s (+13) - (added) llvm/test/MC/Disassembler/AMDGPU/gfx9_wave32_feature.txt (+13) ``diff diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp index f2e2d0ed3f8a6..0ea5ad7ccaea4 100644 --- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp +++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp @@ -82,20 +82,32 @@ createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) { MCSubtargetInfo *STI = createAMDGPUMCSubtargetInfoImpl(TT, CPU, /*TuneCPU*/ CPU, FS); + bool IsWave64 = STI->hasFeature(AMDGPU::FeatureWavefrontSize64); + bool IsWave32 = STI->hasFeature(AMDGPU::FeatureWavefrontSize32); + // FIXME: We should error for the default target. if (STI->getFeatureBits().none()) STI->ToggleFeature(AMDGPU::FeatureSouthernIslands); - if (!STI->hasFeature(AMDGPU::FeatureWavefrontSize64) && - !STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) { + if (!IsWave64 && !IsWave32) { // If there is no default wave size it must be a generation before gfx10, // these have FeatureWavefrontSize64 in their definition already. For gfx10+ // set wave32 as a default. STI->ToggleFeature(AMDGPU::isGFX10Plus(*STI) ? AMDGPU::FeatureWavefrontSize32 : AMDGPU::FeatureWavefrontSize64); + } else if (IsWave64 && IsWave32) { +// The wave size is mutually exclusive. If both somehow end up set, wave64 +// wins. +// +// FIXME: This should really just be an error. +STI->ToggleFeature(AMDGPU::FeatureWavefrontSize32); } + assert((STI->hasFeature(AMDGPU::FeatureWavefrontSize64) ^ + STI->hasFeature(AMDGPU::FeatureWavefrontSize32)) && + "wavesize features are mutually exclusive"); + return STI; } diff --git a/llvm/test/MC/AMDGPU/wave_any.s b/llvm/test/MC/AMDGPU/wave_any.s index 27502eff89bfc..15b235a92d68e 100644 --- a/llvm/test/MC/AMDGPU/wave_any.s +++ b/llvm/test/MC/AMDGPU/wave_any.s @@ -1,229 +1,231 @@ -// RUN: llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefix=GFX10 %s +// NOTE: Assertions have been autogenerated by utils/update_mc_test_checks.py UTC_ARGS: --version 6 +// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -show-encoding %s | FileCheck --check-prefixes=GFX10 %s +// RUN: not llvm-mc -triple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,+wavefrontsize64 -filetype=null %s 2>&1 | FileCheck -implicit-check-not=error: --check-prefixes=GFX10-ERR %s v_cmp_ge_i32_e32 s0, v0 -// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x0c,0x7d] +// GFX10: v_cmp_ge_i32_e32 vcc, s0, v0; encoding: [0x00,0x00,0x0c,0x7d] v_cmp_ge_i32_e32 vcc_lo, s0, v1 -// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v1 ; encoding: [0x00,0x02,0x0c,0x7d] +// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode v_cmp_ge_i32_e32 vcc, s0, v2 -// GFX10: v_cmp_ge_i32_e32 vcc_lo, s0, v2 ; encoding: [0x00,0x04,0x0c,0x7d] +// GFX10: v_cmp_ge_i32_e32 vcc, s0, v2; encoding: [0x00,0x04,0x0c,0x7d] v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD -// GFX10: v_cmp_le_f16_sdwa s0, v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06] +// GFX10-ERR: :[[@LINE-1]]:19: error: invalid operand for instruction v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD // GFX10: v_cmp_le_f16_sdwa s[0:1], v3, v4 src0_sel:WORD_1 src1_sel:DWORD ; encoding: [0xf9,0x08,0x96,0x7d,0x03,0x80,0x05,0x06] v_cmp_class_f32_e32 vcc_lo, s0, v0 -// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d] +// GFX10-ERR: :[[@LINE-1]]:1: error: operands are not valid for this GPU or mode v_cmp_class_f32_e32 vcc, s0, v0 -// GFX10: v_cmp_class_f32_e32 vcc_lo, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d] +// GFX10: v_cmp_class_f32_e32 vcc, s0, v0 ; encoding: [0x00,0x00,0x10,0x7d] v_cmp_class_f16_sdw
[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#159234** https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/159234?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#159227** https://app.graphite.dev/github/pr/llvm/llvm-project/159227?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/159234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add test where WeakCrossingSIV misses dependency due to overflow (NFC) (PR #158281)
https://github.com/kasuga-fj edited https://github.com/llvm/llvm-project/pull/158281 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)
@@ -815,8 +815,8 @@ for.end: ; preds = %for.body ;; A[3*i - 2] = 1; ;; } ;; -;; FIXME: DependencyAnalsysis currently detects no dependency between -;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example, +;; There exists dependency between `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. +;; For example, kasuga-fj wrote: It is intentional. I think it's non-trivial that the dependency exists between them. https://github.com/llvm/llvm-project/pull/157086 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies when no runtime (PR #157754)
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/157754 >From 75a8df62df2ef7e8c02d7a76120e57e2dd1a1539 Mon Sep 17 00:00:00 2001 From: "Joel E. Denny" Date: Tue, 9 Sep 2025 17:33:38 -0400 Subject: [PATCH 1/2] [LoopUnroll] Fix block frequencies when no runtime This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a runtime, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a runtime and complete unrolling, and there are two associated tests this patch marks as XFAIL. They will be addressed in future patches that should land with this patch. --- llvm/lib/Transforms/Utils/LoopUnroll.cpp | 36 -- .../peel.ll} | 0 .../branch-weights-freq/unroll-partial.ll | 68 +++ .../LoopUnroll/runtime-loop-branchweight.ll | 1 + .../LoopUnroll/unroll-heuristics-pgo.ll | 1 + 5 files changed, 100 insertions(+), 6 deletions(-) rename llvm/test/Transforms/LoopUnroll/{peel-branch-weights-freq.ll => branch-weights-freq/peel.ll} (100%) create mode 100644 llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp index 8a6c7789d1372..93c43396c54b6 100644 --- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp +++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp @@ -499,9 +499,8 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, const unsigned MaxTripCount = SE->getSmallConstantMaxTripCount(L); const bool MaxOrZero = SE->isBackedgeTakenCountMaxOrZero(L); - unsigned EstimatedLoopInvocationWeight = 0; std::optional OriginalTripCount = - llvm::getLoopEstimatedTripCount(L, &EstimatedLoopInvocationWeight); + llvm::getLoopEstimatedTripCount(L); // Effectively "DCE" unrolled iterations that are beyond the max tripcount // and will never be executed. @@ -1130,10 +1129,35 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI, // We shouldn't try to use `L` anymore. L = nullptr; } else if (OriginalTripCount) { -// Update the trip count. Note that the remainder has already logic -// computing it in `UnrollRuntimeLoopRemainder`. -setLoopEstimatedTripCount(L, *OriginalTripCount / ULO.Count, - EstimatedLoopInvocationWeight); +// Update metadata for the estimated trip count. +// +// If ULO.Runtime, UnrollRuntimeLoopRemainder handles branch weights for the +// remainder loop it creates, and the unrolled loop's branch weights are +// adjusted below. Otherwise, if unrolled loop iterations' latches become +// unconditional, branch weights are adjusted above. Otherwise, the +// original loop's branch weights are correct for the unrolled loop, so do +// not adjust them. +// FIXME: Actually handle such unconditional latches and ULO.Runtime. +// +// For example, consider what happens if the unroll count is 4 for a loop +// with an estimated trip count of 10 when we do not create a remainder loop +// and all iterations' latches remain conditional. Each unrolled +// iteration's latch still has the same probability of exiting the loop as +// it did when in the original loop, and thus it should still have the same +// branch weights. Each unrolled iteration's non-zero probability of +// exiting already appropriately reduces the probability of reaching the +// remaining iterations just as it did in the original loop. Trying to also +// adjust the branch weights of the final unrolled iteration's latch (i.e., +// the backedge for the unrolled loop as a whole) to reflect its new trip +// count of 3 will erroneously further reduce its block frequencies. +// However, in case an analysis later needs to estimate the trip count of +// the unrolled loop as a whole without considering the branch weights for +// each unrolled iteration's latch within it, we store the new trip count as +// separate metadata. +unsigned NewTripCount = *OriginalTripCount / ULO.Count; +if (!ULO.Runtime && *OriginalTripCount % ULO.Count) + NewTripCount += 1; +setLoopEstima
[llvm-branch-commits] [llvm] [DA] Add test where WeakCrossingSIV misses dependency due to overflow (NFC) (PR #158281)
https://github.com/kasuga-fj updated https://github.com/llvm/llvm-project/pull/158281 >From a42c8002548c97d6c7755b1db821a5c682881efe Mon Sep 17 00:00:00 2001 From: Ryotaro Kasuga Date: Fri, 12 Sep 2025 11:06:39 + Subject: [PATCH] [DA] Add test where WeakCrossingSIV misses dependency due to overflow --- .../DependenceAnalysis/WeakCrossingSIV.ll | 224 ++ 1 file changed, 224 insertions(+) diff --git a/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll b/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll index cd044032e34f1..58dded965de27 100644 --- a/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll +++ b/llvm/test/Analysis/DependenceAnalysis/WeakCrossingSIV.ll @@ -1,6 +1,8 @@ ; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5 ; RUN: opt < %s -disable-output "-passes=print" -aa-pipeline=basic-aa 2>&1 \ ; RUN: | FileCheck %s +; RUN: opt < %s -disable-output "-passes=print" -da-run-siv-routines-only 2>&1 \ +; RUN: | FileCheck %s --check-prefix=CHECK-SIV-ONLY ; ModuleID = 'WeakCrossingSIV.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" @@ -26,6 +28,20 @@ define void @weakcrossing0(ptr %A, ptr %B, i64 %n) nounwind uwtable ssp { ; CHECK-NEXT: Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 ; CHECK-NEXT:da analyze - none! ; +; CHECK-SIV-ONLY-LABEL: 'weakcrossing0' +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %conv, ptr %arrayidx, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: %0 = load i32, ptr %arrayidx2, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - flow [0|<]! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - confused! +; CHECK-SIV-ONLY-NEXT: Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: %0 = load i32, ptr %arrayidx2, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - confused! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; entry: %cmp1 = icmp eq i64 %n, 0 br i1 %cmp1, label %for.end, label %for.body.preheader @@ -79,6 +95,21 @@ define void @weakcrossing1(ptr %A, ptr %B, i64 %n) nounwind uwtable ssp { ; CHECK-NEXT: Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 ; CHECK-NEXT:da analyze - none! ; +; CHECK-SIV-ONLY-LABEL: 'weakcrossing1' +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %conv, ptr %arrayidx, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: %0 = load i32, ptr %arrayidx2, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - flow [<>] splitable! +; CHECK-SIV-ONLY-NEXT:da analyze - split level = 1, iteration = 0! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - confused! +; CHECK-SIV-ONLY-NEXT: Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: %0 = load i32, ptr %arrayidx2, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: %0 = load i32, ptr %arrayidx2, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - confused! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %0, ptr %B.addr.02, align 4 --> Dst: store i32 %0, ptr %B.addr.02, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; entry: %cmp1 = icmp eq i64 %n, 0 br i1 %cmp1, label %for.end, label %for.body.preheader @@ -130,6 +161,20 @@ define void @weakcrossing2(ptr %A, ptr %B, i64 %n) nounwind uwtable ssp { ; CHECK-NEXT: Src: store i32 %0, ptr %B.addr.01, align 4 --> Dst: store i32 %0, ptr %B.addr.01, align 4 ; CHECK-NEXT:da analyze - none! ; +; CHECK-SIV-ONLY-LABEL: 'weakcrossing2' +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %conv, ptr %arrayidx, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: %0 = load i32, ptr %arrayidx1, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i32 %conv, ptr %arrayidx, align 4 --> Dst: store i32 %0, ptr %B.addr.01, align 4 +; CHECK-SIV-ONLY-NEXT:da analyze - confused! +; CHECK-SIV-ONLY-NEXT: Src: %0 = load i32, ptr %arrayidx1, align 4 --> Dst: %0 = load i32
[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)
https://github.com/kasuga-fj updated https://github.com/llvm/llvm-project/pull/157086 >From 9f8794a071e152cf128dc03d9994c884fecf5d12 Mon Sep 17 00:00:00 2001 From: Ryotaro Kasuga Date: Fri, 5 Sep 2025 11:41:29 + Subject: [PATCH 1/2] [DA] Add overflow check in ExactSIV --- llvm/lib/Analysis/DependenceAnalysis.cpp | 14 +- llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll | 2 +- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp index 0f77a1410e83b..6e576e866b310 100644 --- a/llvm/lib/Analysis/DependenceAnalysis.cpp +++ b/llvm/lib/Analysis/DependenceAnalysis.cpp @@ -1170,6 +1170,15 @@ const SCEVConstant *DependenceInfo::collectConstantUpperBound(const Loop *L, return nullptr; } +/// Returns \p A - \p B if it guaranteed not to signed wrap. Otherwise returns +/// nullptr. \p A and \p B must have the same integer type. +static const SCEV *minusSCEVNoSignedOverflow(const SCEV *A, const SCEV *B, + ScalarEvolution &SE) { + if (SE.willNotOverflow(Instruction::Sub, /*Signed=*/true, A, B)) +return SE.getMinusSCEV(A, B); + return nullptr; +} + // testZIV - // When we have a pair of subscripts of the form [c1] and [c2], // where c1 and c2 are both loop invariant, we attack it using @@ -1626,7 +1635,9 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff, const SCEV *DstCoeff, assert(0 < Level && Level <= CommonLevels && "Level out of range"); Level--; Result.Consistent = false; - const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst); + const SCEV *Delta = minusSCEVNoSignedOverflow(DstConst, SrcConst, *SE); + if (!Delta) +return false; LLVM_DEBUG(dbgs() << "\tDelta = " << *Delta << "\n"); NewConstraint.setLine(SrcCoeff, SE->getNegativeSCEV(DstCoeff), Delta, CurLoop); @@ -1716,6 +1727,7 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff, const SCEV *DstCoeff, // explore directions unsigned NewDirection = Dependence::DVEntry::NONE; APInt LowerDistance, UpperDistance; + // TODO: Overflow check may be needed. if (TA.sgt(TB)) { LowerDistance = (TY - TX) + (TA - TB) * TL; UpperDistance = (TY - TX) + (TA - TB) * TU; diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll index 2a809c32d7d21..e8e7cb11bb23e 100644 --- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll +++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll @@ -841,7 +841,7 @@ define void @exact14(ptr %A) { ; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 ; CHECK-SIV-ONLY-NEXT:da analyze - none! ; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 -; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]! ; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 ; CHECK-SIV-ONLY-NEXT:da analyze - none! ; >From a34c3208d903906caf5b9435f1705f695a68277e Mon Sep 17 00:00:00 2001 From: Ryotaro Kasuga Date: Tue, 16 Sep 2025 13:12:16 + Subject: [PATCH 2/2] fix comment --- llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll index e8e7cb11bb23e..6f33e2314ffba 100644 --- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll +++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll @@ -815,8 +815,8 @@ for.end: ; preds = %for.body ;; A[3*i - 2] = 1; ;; } ;; -;; FIXME: DependencyAnalsysis currently detects no dependency between -;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`, but it does exist. For example, +;; There exists dependency between `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. +;; For example, ;; ;; | memory location| -6*i + INT64_MAX | 3*i - 2 ;; |||--- ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)
https://github.com/kasuga-fj updated https://github.com/llvm/llvm-project/pull/157085 >From 4e43533b48aa613b05fb0753ac290809da8f28d1 Mon Sep 17 00:00:00 2001 From: Ryotaro Kasuga Date: Fri, 5 Sep 2025 11:32:54 + Subject: [PATCH 1/2] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) --- .../Analysis/DependenceAnalysis/ExactSIV.ll | 120 ++ 1 file changed, 120 insertions(+) diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll index 0fe62991fede9..a16751397c487 100644 --- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll +++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll @@ -807,3 +807,123 @@ for.body: ; preds = %entry, %for.body for.end: ; preds = %for.body ret void } + +;; max_i = INT64_MAX/6 // 1537228672809129301 +;; for (long long i = 0; i <= max_i; i++) { +;; A[-6*i + INT64_MAX] = 0; +;; if (i) +;; A[3*i - 2] = 1; +;; } +;; +;; FIXME: There is a loop-carried dependency between +;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example, +;; +;; | memory location| -6*i + INT64_MAX | 3*i - 2 +;; |||--- +;; | A[1] | i = max_i | i = 1 +;; | A[4611686018427387901] | i = 768614336404564651 | i = max_i +;; +;; Actually, +;; * 1 = -6*max_i + INT64_MAX = 3*1 - 2 +;; * 4611686018427387901 = -6*768614336404564651 + INT64_MAX = 3*max_i - 2 +;; + +define void @exact14(ptr %A) { +; CHECK-LABEL: 'exact14' +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; +; CHECK-SIV-ONLY-LABEL: 'exact14' +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; +entry: + br label %loop.header + +loop.header: + %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ] + %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, %loop.latch ] + %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ] + %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0 + store i8 0, ptr %idx.0 + %cond.store = icmp ne i64 %i, 0 + br i1 %cond.store, label %if.store, label %loop.latch + +if.store: + %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1 + store i8 1, ptr %idx.1 + br label %loop.latch + +loop.latch: + %i.inc = add nuw nsw i64 %i, 1 + %subscript.0.next = add nsw i64 %subscript.0, -6 + %subscript.1.next = add nsw i64 %subscript.1, 3 + %exitcond = icmp sgt i64 %i.inc, 1537228672809129301 + br i1 %exitcond, label %exit, label %loop.header + +exit: + ret void +} + +;; A generalized version of @exact14. +;; +;; for (long long i = 0; i <= n / 6; i++) { +;; A[-6*i + n] = 0; +;; if (i) +;; A[3*i - 2] = 1; +;; } + +define void @exact15(ptr %A, i64 %n) { +; CHECK-LABEL: 'exact15' +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - output [*|<]! +; CHECK-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; +; CHECK-SIV-ONLY-LABEL: 'exact15' +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]! +; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; +entry: + %bound = sdiv i64 %n, 6 + %guard = icmp sgt i64 %n, 0 + br i1 %guard, label %loop.header, label %exit + +loop.header: + %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ] + %subscript.0 = phi i64 [ %n, %entry ], [ %subscript.0.next, %loop.latch ] + %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ] + %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0 + store i8 0, ptr %idx.0 + %cond.store = icmp
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
tobias-stadler wrote: It would be good to change the testing methodology here. Currently all the dsymutil tests are blobs. We should be able to get remarks and .o files from llc. However, we need to link the .o files into a binary. Do you know of a way to do this with the available llvm tools? https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
tobias-stadler wrote: Until we figure out a better testing methodology for dsymutil, I'd like to land this with the blob tests to unblock further work on the remarks infra. https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)
@@ -902,7 +908,9 @@ def memri34_pcrel : Operand { // memri, imm is a 34-bit value. def PPCRegGxRCOperand : AsmOperandClass { let Name = "RegGxRC"; let PredicateMethod = "isRegNumber"; } -def ptr_rc_idx : Operand, PointerLikeRegClass<0> { +def ptr_rc_idx : Operand, s-barannikov wrote: This one is still using double inheritance. https://github.com/llvm/llvm-project/pull/158777 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c4a5c58 - Revert "AMDGPU/GlobalISel: Import D16 load patterns and add combines for them…"
Author: Petar Avramovic Date: 2025-09-11T12:48:18+02:00 New Revision: c4a5c5809defb97fd1b757694d71bb7aa0978544 URL: https://github.com/llvm/llvm-project/commit/c4a5c5809defb97fd1b757694d71bb7aa0978544 DIFF: https://github.com/llvm/llvm-project/commit/c4a5c5809defb97fd1b757694d71bb7aa0978544.diff LOG: Revert "AMDGPU/GlobalISel: Import D16 load patterns and add combines for them…" This reverts commit b97010865caa0439d4cedc45e9582e645816519f. Added: Modified: llvm/lib/Target/AMDGPU/AMDGPUCombine.td llvm/lib/Target/AMDGPU/AMDGPUGISel.td llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp llvm/lib/Target/AMDGPU/SIInstructions.td llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_flat.ll llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_global.ll llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_load_local_2.ll llvm/test/CodeGen/AMDGPU/global-saddr-load.ll Removed: llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td index e8b211f7866ad..b5dac95b57a2d 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td @@ -71,12 +71,6 @@ def int_minmax_to_med3 : GICombineRule< [{ return matchIntMinMaxToMed3(*${min_or_max}, ${matchinfo}); }]), (apply [{ applyMed3(*${min_or_max}, ${matchinfo}); }])>; -let Predicates = [Predicate<"Subtarget->d16PreservesUnusedBits()">] in -def d16_load : GICombineRule< - (defs root:$bitcast), - (combine (G_BITCAST $dst, $src):$bitcast, - [{ return combineD16Load(*${bitcast} ); }])>; - def fp_minmax_to_med3 : GICombineRule< (defs root:$min_or_max, med3_matchdata:$matchinfo), (match (wip_match_opcode G_FMAXNUM, @@ -225,6 +219,5 @@ def AMDGPURegBankCombiner : GICombiner< zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain, fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp, identity_combines, redundant_and, constant_fold_cast_op, - cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines, - d16_load]> { + cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> { } diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td index bb4bf742fb861..0c112d1787c1a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td @@ -315,13 +315,6 @@ def : GINodeEquiv; def : GINodeEquiv; def : GINodeEquiv; -def : GINodeEquiv; -def : GINodeEquiv; -def : GINodeEquiv; -def : GINodeEquiv; -def : GINodeEquiv; -def : GINodeEquiv; - def : GINodeEquiv; // G_AMDGPU_WHOLE_WAVE_FUNC_RETURN is simpler than AMDGPUwhole_wave_return, // so we don't mark it as equivalent. diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp index fd604e1b19cd4..ee324a5e93f0f 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp @@ -89,10 +89,6 @@ class AMDGPURegBankCombinerImpl : public Combiner { void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext) const; - bool combineD16Load(MachineInstr &MI) const; - bool applyD16Load(unsigned D16Opc, MachineInstr &DstMI, -MachineInstr *SmallLoad, Register ToOverwriteD16) const; - private: SIModeRegisterDefaults getMode() const; bool getIEEE() const; @@ -396,88 +392,6 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } -bool AMDGPURegBankCombinerImpl::combineD16Load(MachineInstr &MI) const { - Register Dst; - MachineInstr *Load, *SextLoad; - const int64_t CleanLo16 = 0x; - const int64_t CleanHi16 = 0x; - - // Load lo - if (mi_match(MI.getOperand(1).getReg(), MRI, - m_GOr(m_GAnd(m_GBitcast(m_Reg(Dst)), -m_Copy(m_SpecificICst(CleanLo16))), - m_MInstr(Load { - -if (Load->getOpcode() == AMDGPU::G_ZEXTLOAD) { - const MachineMemOperand *MMO = *Load->memoperands_begin(); - unsigned LoadSize = MMO->getSizeInBits().getValue(); - if (LoadSize == 8) -return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO_U8, MI, Load, Dst); - if (LoadSize == 16) -return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO, MI, Load, Dst); - return false; -} - -if (mi_match( -Load, MRI, -m_GAnd(m_MInstr(SextLoad), m_Copy(m_SpecificICst(CleanHi16) { - if (SextLoad->getOpcode() != AMDGPU::G_SEXTLOAD) -return false; - - const MachineMemOperand *MMO = *SextLoad->memoperands_begin(); - if (MMO->getSizeInBits().getValue() != 8) -return false; - - return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO_I8, MI, SextLoad, Dst); -} - -return false; - } - - // Load hi - if (mi_match(MI.getOperand(1).getR
[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
@@ -144,6 +145,73 @@ struct Embedding { using InstEmbeddingsMap = DenseMap; using BBEmbeddingsMap = DenseMap; +/// Generic storage class for section-based vocabularies. +/// VocabStorage provides a generic foundation for storing and accessing +/// embeddings organized into sections. +class VocabStorage { +private: + /// Section-based storage + std::vector> Sections; + + size_t TotalSize = 0; + unsigned Dimension = 0; + +public: + /// Default constructor creates empty storage (invalid state) + VocabStorage() : Sections(), TotalSize(0), Dimension(0) {} + + /// Create a VocabStorage with pre-organized section data + VocabStorage(std::vector> &&SectionData); + + VocabStorage(VocabStorage &&) = default; + VocabStorage &operator=(VocabStorage &&Other); + + VocabStorage(const VocabStorage &) = delete; + VocabStorage &operator=(const VocabStorage &) = delete; + + /// Get total number of entries across all sections + size_t size() const { return TotalSize; } + + /// Get number of sections + unsigned getNumSections() const { +return static_cast(Sections.size()); + } + + /// Section-based access: Storage[sectionId][localIndex] + const std::vector &operator[](unsigned SectionId) const { +assert(SectionId < Sections.size() && "Invalid section ID"); +return Sections[SectionId]; + } + + /// Get vocabulary dimension + unsigned getDimension() const { return Dimension; } + + /// Check if vocabulary is valid (has data) + bool isValid() const { return TotalSize > 0; } + + /// Iterator support for section-based access svkeerthy wrote: Having this iterator makes it easy to iterate over the vocabulary (We iterate over the vocabulary in tool). https://github.com/llvm/llvm-project/pull/158376 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c5b5583 - Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional up…"
Author: Mingming Liu Date: 2025-09-16T12:51:22-07:00 New Revision: c5b558385b956faf99348b3f0de91926061afcfb URL: https://github.com/llvm/llvm-project/commit/c5b558385b956faf99348b3f0de91926061afcfb DIFF: https://github.com/llvm/llvm-project/commit/c5b558385b956faf99348b3f0de91926061afcfb.diff LOG: Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional up…" This reverts commit 027bccc4692923d0f1ba3d4d970071f747c2255c. Added: Modified: llvm/include/llvm/IR/GlobalObject.h llvm/lib/CodeGen/CodeGenPrepare.cpp llvm/lib/CodeGen/StaticDataAnnotator.cpp llvm/lib/IR/Globals.cpp llvm/lib/Transforms/Instrumentation/MemProfUse.cpp llvm/unittests/IR/CMakeLists.txt Removed: llvm/unittests/IR/GlobalObjectTest.cpp diff --git a/llvm/include/llvm/IR/GlobalObject.h b/llvm/include/llvm/IR/GlobalObject.h index e273387807cf6..08a02b42bdc14 100644 --- a/llvm/include/llvm/IR/GlobalObject.h +++ b/llvm/include/llvm/IR/GlobalObject.h @@ -121,10 +121,8 @@ class GlobalObject : public GlobalValue { /// appropriate default object file section. LLVM_ABI void setSection(StringRef S); - /// If existing prefix is diff erent from \p Prefix, set it to \p Prefix. If \p - /// Prefix is empty, the set clears the existing metadata. Returns true if - /// section prefix changed and false otherwise. - LLVM_ABI bool setSectionPrefix(StringRef Prefix); + /// Set the section prefix for this global object. + LLVM_ABI void setSectionPrefix(StringRef Prefix); /// Get the section prefix for this global object. LLVM_ABI std::optional getSectionPrefix() const; diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp index 92d87681c9adc..9db4c9e5e2807 100644 --- a/llvm/lib/CodeGen/CodeGenPrepare.cpp +++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp @@ -583,23 +583,23 @@ bool CodeGenPrepare::_run(Function &F) { // if requested. if (BBSectionsGuidedSectionPrefix && BBSectionsProfileReader && BBSectionsProfileReader->isFunctionHot(F.getName())) { -EverMadeChange |= F.setSectionPrefix("hot"); +F.setSectionPrefix("hot"); } else if (ProfileGuidedSectionPrefix) { // The hot attribute overwrites profile count based hotness while profile // counts based hotness overwrite the cold attribute. // This is a conservative behabvior. if (F.hasFnAttribute(Attribute::Hot) || PSI->isFunctionHotInCallGraph(&F, *BFI)) - EverMadeChange |= F.setSectionPrefix("hot"); + F.setSectionPrefix("hot"); // If PSI shows this function is not hot, we will placed the function // into unlikely section if (1) PSI shows this is a cold function, or // (2) the function has a attribute of cold. else if (PSI->isFunctionColdInCallGraph(&F, *BFI) || F.hasFnAttribute(Attribute::Cold)) - EverMadeChange |= F.setSectionPrefix("unlikely"); + F.setSectionPrefix("unlikely"); else if (ProfileUnknownInSpecialSection && PSI->hasPartialSampleProfile() && PSI->isFunctionHotnessUnknown(F)) - EverMadeChange |= F.setSectionPrefix("unknown"); + F.setSectionPrefix("unknown"); } /// This optimization identifies DIV instructions that can be diff --git a/llvm/lib/CodeGen/StaticDataAnnotator.cpp b/llvm/lib/CodeGen/StaticDataAnnotator.cpp index 53a9ab4dbda02..2d9b489a80acb 100644 --- a/llvm/lib/CodeGen/StaticDataAnnotator.cpp +++ b/llvm/lib/CodeGen/StaticDataAnnotator.cpp @@ -91,7 +91,8 @@ bool StaticDataAnnotator::runOnModule(Module &M) { if (SectionPrefix.empty()) continue; -Changed |= GV.setSectionPrefix(SectionPrefix); +GV.setSectionPrefix(SectionPrefix); +Changed = true; } return Changed; diff --git a/llvm/lib/IR/Globals.cpp b/llvm/lib/IR/Globals.cpp index 1a7a5c5fbad6b..11d33e262fecb 100644 --- a/llvm/lib/IR/Globals.cpp +++ b/llvm/lib/IR/Globals.cpp @@ -288,22 +288,10 @@ void GlobalObject::setSection(StringRef S) { setGlobalObjectFlag(HasSectionHashEntryBit, !S.empty()); } -bool GlobalObject::setSectionPrefix(StringRef Prefix) { - StringRef ExistingPrefix; - if (std::optional MaybePrefix = getSectionPrefix()) -ExistingPrefix = *MaybePrefix; - - if (ExistingPrefix == Prefix) -return false; - - if (Prefix.empty()) { -setMetadata(LLVMContext::MD_section_prefix, nullptr); -return true; - } +void GlobalObject::setSectionPrefix(StringRef Prefix) { MDBuilder MDB(getContext()); setMetadata(LLVMContext::MD_section_prefix, MDB.createGlobalObjectSectionPrefix(Prefix)); - return true; } std::optional GlobalObject::getSectionPrefix() const { diff --git a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp index c86092bd51eda..ecb2f2dbc552b 100644 --- a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp +++ b/llvm/lib/Transforms/Instrumentation/MemProf
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
https://github.com/tobias-stadler edited https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)
https://github.com/kasuga-fj updated https://github.com/llvm/llvm-project/pull/157085 >From 4e43533b48aa613b05fb0753ac290809da8f28d1 Mon Sep 17 00:00:00 2001 From: Ryotaro Kasuga Date: Fri, 5 Sep 2025 11:32:54 + Subject: [PATCH 1/2] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) --- .../Analysis/DependenceAnalysis/ExactSIV.ll | 120 ++ 1 file changed, 120 insertions(+) diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll index 0fe62991fede9..a16751397c487 100644 --- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll +++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll @@ -807,3 +807,123 @@ for.body: ; preds = %entry, %for.body for.end: ; preds = %for.body ret void } + +;; max_i = INT64_MAX/6 // 1537228672809129301 +;; for (long long i = 0; i <= max_i; i++) { +;; A[-6*i + INT64_MAX] = 0; +;; if (i) +;; A[3*i - 2] = 1; +;; } +;; +;; FIXME: There is a loop-carried dependency between +;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example, +;; +;; | memory location| -6*i + INT64_MAX | 3*i - 2 +;; |||--- +;; | A[1] | i = max_i | i = 1 +;; | A[4611686018427387901] | i = 768614336404564651 | i = max_i +;; +;; Actually, +;; * 1 = -6*max_i + INT64_MAX = 3*1 - 2 +;; * 4611686018427387901 = -6*768614336404564651 + INT64_MAX = 3*max_i - 2 +;; + +define void @exact14(ptr %A) { +; CHECK-LABEL: 'exact14' +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; +; CHECK-SIV-ONLY-LABEL: 'exact14' +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; +entry: + br label %loop.header + +loop.header: + %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ] + %subscript.0 = phi i64 [ 9223372036854775807, %entry ], [ %subscript.0.next, %loop.latch ] + %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ] + %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0 + store i8 0, ptr %idx.0 + %cond.store = icmp ne i64 %i, 0 + br i1 %cond.store, label %if.store, label %loop.latch + +if.store: + %idx.1 = getelementptr inbounds i8, ptr %A, i64 %subscript.1 + store i8 1, ptr %idx.1 + br label %loop.latch + +loop.latch: + %i.inc = add nuw nsw i64 %i, 1 + %subscript.0.next = add nsw i64 %subscript.0, -6 + %subscript.1.next = add nsw i64 %subscript.1, 3 + %exitcond = icmp sgt i64 %i.inc, 1537228672809129301 + br i1 %exitcond, label %exit, label %loop.header + +exit: + ret void +} + +;; A generalized version of @exact14. +;; +;; for (long long i = 0; i <= n / 6; i++) { +;; A[-6*i + n] = 0; +;; if (i) +;; A[3*i - 2] = 1; +;; } + +define void @exact15(ptr %A, i64 %n) { +; CHECK-LABEL: 'exact15' +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-NEXT:da analyze - none! +; CHECK-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - output [*|<]! +; CHECK-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-NEXT:da analyze - none! +; +; CHECK-SIV-ONLY-LABEL: 'exact15' +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 0, ptr %idx.0, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]! +; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8 1, ptr %idx.1, align 1 +; CHECK-SIV-ONLY-NEXT:da analyze - none! +; +entry: + %bound = sdiv i64 %n, 6 + %guard = icmp sgt i64 %n, 0 + br i1 %guard, label %loop.header, label %exit + +loop.header: + %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.latch ] + %subscript.0 = phi i64 [ %n, %entry ], [ %subscript.0.next, %loop.latch ] + %subscript.1 = phi i64 [ -2, %entry ], [ %subscript.1.next, %loop.latch ] + %idx.0 = getelementptr inbounds i8, ptr %A, i64 %subscript.0 + store i8 0, ptr %idx.0 + %cond.store = icmp
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/abidh approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)
@@ -868,10 +868,16 @@ def crbitm: Operand { def PPCRegGxRCNoR0Operand : AsmOperandClass { let Name = "RegGxRCNoR0"; let PredicateMethod = "isRegNumber"; } -def ptr_rc_nor0 : Operand, PointerLikeRegClass<1> { + +def ptr_rc_nor0 : RegClassByHwMode< + [PPC32, PPC64], + [GPRC_NOR0, G8RC_NOX0]>; + +def PtrOpNoR0 : RegisterOperand { s-barannikov wrote: Maybe swap the names of RegClassByHwMode / RegisterOperand to reduce diff? https://github.com/llvm/llvm-project/pull/158777 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)
https://github.com/dwblaikie edited https://github.com/llvm/llvm-project/pull/159209 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)
https://github.com/dwblaikie edited https://github.com/llvm/llvm-project/pull/159209 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] port 5b4819e to release (PR #159209)
https://github.com/dwblaikie ready_for_review https://github.com/llvm/llvm-project/pull/159209 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][CodeGe][CFI] Pre-commit transparent_union tests (PR #158192)
https://github.com/vitalybuka edited https://github.com/llvm/llvm-project/pull/158192 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
https://github.com/svkeerthy updated https://github.com/llvm/llvm-project/pull/158376 >From 763b16710251eb055b0b192051069cbc838dd7d4 Mon Sep 17 00:00:00 2001 From: svkeerthy Date: Fri, 12 Sep 2025 22:06:44 + Subject: [PATCH] VocabStorage --- llvm/include/llvm/Analysis/IR2Vec.h | 145 +++-- llvm/lib/Analysis/IR2Vec.cpp | 215 + llvm/lib/Analysis/InlineAdvisor.cpp | 2 +- llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp| 6 +- .../FunctionPropertiesAnalysisTest.cpp| 13 +- llvm/unittests/Analysis/IR2VecTest.cpp| 294 +++--- 6 files changed, 541 insertions(+), 134 deletions(-) diff --git a/llvm/include/llvm/Analysis/IR2Vec.h b/llvm/include/llvm/Analysis/IR2Vec.h index 4a6db5d895a62..1d3f87e47d269 100644 --- a/llvm/include/llvm/Analysis/IR2Vec.h +++ b/llvm/include/llvm/Analysis/IR2Vec.h @@ -45,6 +45,7 @@ #include "llvm/Support/JSON.h" #include #include +#include namespace llvm { @@ -144,6 +145,73 @@ struct Embedding { using InstEmbeddingsMap = DenseMap; using BBEmbeddingsMap = DenseMap; +/// Generic storage class for section-based vocabularies. +/// VocabStorage provides a generic foundation for storing and accessing +/// embeddings organized into sections. +class VocabStorage { +private: + /// Section-based storage + std::vector> Sections; + + const size_t TotalSize = 0; + const unsigned Dimension = 0; + +public: + /// Default constructor creates empty storage (invalid state) + VocabStorage() : Sections(), TotalSize(0), Dimension(0) {} + + /// Create a VocabStorage with pre-organized section data + VocabStorage(std::vector> &&SectionData); + + VocabStorage(VocabStorage &&) = default; + VocabStorage &operator=(VocabStorage &&) = delete; + + VocabStorage(const VocabStorage &) = delete; + VocabStorage &operator=(const VocabStorage &) = delete; + + /// Get total number of entries across all sections + size_t size() const { return TotalSize; } + + /// Get number of sections + unsigned getNumSections() const { +return static_cast(Sections.size()); + } + + /// Section-based access: Storage[sectionId][localIndex] + const std::vector &operator[](unsigned SectionId) const { +assert(SectionId < Sections.size() && "Invalid section ID"); +return Sections[SectionId]; + } + + /// Get vocabulary dimension + unsigned getDimension() const { return Dimension; } + + /// Check if vocabulary is valid (has data) + bool isValid() const { return TotalSize > 0; } + + /// Iterator support for section-based access + class const_iterator { +const VocabStorage *Storage; +unsigned SectionId = 0; +size_t LocalIndex = 0; + + public: +const_iterator(const VocabStorage *Storage, unsigned SectionId, + size_t LocalIndex) +: Storage(Storage), SectionId(SectionId), LocalIndex(LocalIndex) {} + +LLVM_ABI const Embedding &operator*() const; +LLVM_ABI const_iterator &operator++(); +LLVM_ABI bool operator==(const const_iterator &Other) const; +LLVM_ABI bool operator!=(const const_iterator &Other) const; + }; + + const_iterator begin() const { return const_iterator(this, 0, 0); } + const_iterator end() const { +return const_iterator(this, getNumSections(), 0); + } +}; + /// Class for storing and accessing the IR2Vec vocabulary. /// The Vocabulary class manages seed embeddings for LLVM IR entities. The /// seed embeddings are the initial learned representations of the entities @@ -164,7 +232,7 @@ using BBEmbeddingsMap = DenseMap; class Vocabulary { friend class llvm::IR2VecVocabAnalysis; - // Vocabulary Slot Layout: + // Vocabulary Layout: // ++--+ // | Entity Type| Index Range | // ++--+ @@ -175,8 +243,16 @@ class Vocabulary { // Note: "Similar" LLVM Types are grouped/canonicalized together. // Operands include Comparison predicates (ICmp/FCmp). // This can be extended to include other specializations in future. - using VocabVector = std::vector; - VocabVector Vocab; + enum class Section : unsigned { +Opcodes = 0, +CanonicalTypes = 1, +Operands = 2, +Predicates = 3, +MaxSections + }; + + // Use section-based storage for better organization and efficiency + VocabStorage Storage; static constexpr unsigned NumICmpPredicates = static_cast(CmpInst::LAST_ICMP_PREDICATE) - @@ -228,9 +304,18 @@ class Vocabulary { NumICmpPredicates + NumFCmpPredicates; Vocabulary() = default; - LLVM_ABI Vocabulary(VocabVector &&Vocab) : Vocab(std::move(Vocab)) {} + LLVM_ABI Vocabulary(VocabStorage &&Storage) : Storage(std::move(Storage)) {} + + Vocabulary(const Vocabulary &) = delete; + Vocabulary &operator=(const Vocabulary &) = delete; + + Vocabulary(Vocabulary &&) = default; + Vocabulary &op
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
jroelofs wrote: likewise. I’ll leave this “unresolved” so it doesn’t get hidden https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Ensure both wavesize features are not set (PR #159234)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/159234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156610 >From bdd9ab29d7c0c57edc5b8848c7e4be5626b5f57e Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 08:36:34 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. --- .../OpenMP/DoConcurrentConversion.cpp | 117 ++ .../DoConcurrent/reduce_device.mlir | 53 2 files changed, 121 insertions(+), 49 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index d00a4fdd2cf2e..6e308499100fa 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, for (mlir::Value local : loop.getLocalVars()) liveIns.push_back(local); + + for (mlir::Value reduce : loop.getReduceVars()) +liveIns.push_back(reduce); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -319,7 +322,7 @@ class DoConcurrentConversion targetOp = genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns, targetClauseOps, loopNestClauseOps, liveInShapeInfoMap); - genTeamsOp(doLoop.getLoc(), rewriter); + genTeamsOp(rewriter, loop, mapper); } mlir::omp::ParallelOp parallelOp = @@ -492,46 +495,7 @@ class DoConcurrentConversion if (!mapToDevice) genPrivatizers(rewriter, mapper, loop, wsloopClauseOps); -if (!loop.getReduceVars().empty()) { - for (auto [op, byRef, sym, arg] : llvm::zip_equal( - loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(), - loop.getReduceSymsAttr().getAsRange(), - loop.getRegionReduceArgs())) { -auto firReducer = moduleSymbolTable.lookup( -sym.getLeafReference()); - -mlir::OpBuilder::InsertionGuard guard(rewriter); -rewriter.setInsertionPointAfter(firReducer); -std::string ompReducerName = sym.getLeafReference().str() + ".omp"; - -auto ompReducer = -moduleSymbolTable.lookup( -rewriter.getStringAttr(ompReducerName)); - -if (!ompReducer) { - ompReducer = mlir::omp::DeclareReductionOp::create( - rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); - - cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), - ompReducer.getAllocRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(), - ompReducer.getInitializerRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(), - ompReducer.getReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(), - ompReducer.getAtomicReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(), - ompReducer.getCleanupRegion()); - moduleSymbolTable.insert(ompReducer); -} - -wsloopClauseOps.reductionVars.push_back(op); -wsloopClauseOps.reductionByref.push_back(byRef); -wsloopClauseOps.reductionSyms.push_back( -mlir::SymbolRefAttr::get(ompReducer)); - } -} +genReductions(rewriter, mapper, loop, wsloopClauseOps); auto wsloopOp = mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps); @@ -553,8 +517,6 @@ class DoConcurrentConversion rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back()); mlir::omp::YieldOp::create(rewriter, loop->getLoc()); -loop->getParentOfType().print( -llvm::errs(), mlir::OpPrintingFlags().assumeVerified()); return {loopNestOp, wsloopOp}; } @@ -778,15 +740,26 @@ class DoConcurrentConversion liveInName, shape); } - mlir::omp::TeamsOp - genTeamsOp(mlir::Location loc, - mlir::ConversionPatternRewriter &rewriter) const { -auto teamsOp = rewriter.create( -loc, /*clauses=*/mlir::omp::TeamsOperands{}); + mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter, +fir::DoConcurrentLoopOp loop, +mlir::IRMapping &mapper) const { +mlir::omp::TeamsOperands teamsOps; +genReductions(rewriter, mapper, loop, teamsOps); + +mlir::Location loc = loop.getLoc(); +aut
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156837 >From c5dde7cbcece549d0996a6671d1ae1b53b9cd63b Mon Sep 17 00:00:00 2001 From: ergawy Date: Thu, 4 Sep 2025 01:06:21 -0500 Subject: [PATCH 1/3] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 + .../omptarget-multi-block-reduction.mlir | 85 +++ 2 files changed, 88 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index 220eee3cb8b087..b516c3c3f4efee 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -3507,6 +3507,8 @@ Expected OpenMPIRBuilder::createReductionFunction( return AfterIP.takeError(); if (!Builder.GetInsertBlock()) return ReductionFunc; + + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHSPtr); } } @@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHS, false); } } diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir new file mode 100644 index 00..aaf06d2d0e0c22 --- /dev/null +++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir @@ -0,0 +1,85 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Verifies that the IR builder can handle reductions with multi-block combiner +// regions on the GPU. + +module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 : ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} { + llvm.func @bar() {} + llvm.func @baz() {} + + omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc { +%0 = llvm.mlir.constant(1 : i64) : i64 +%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr<5> +%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr +omp.yield(%2 : !llvm.ptr) + } init { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +omp.yield(%arg1 : !llvm.ptr) + } combiner { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +llvm.call @bar() : () -> () +llvm.br ^bb3 + + ^bb3: // pred: ^bb1 +llvm.call @baz() : () -> () +omp.yield(%arg0 : !llvm.ptr) + } + llvm.func @foo_() { +%c1 = llvm.mlir.constant(1 : i64) : i64 +%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> !llvm.ptr<5> +%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr +%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"} +omp.target map_entries(%74 -> %arg0 : !llvm.ptr) { + %c1_2 = llvm.mlir.constant(1 : i32) : i32 + %c10 = llvm.mlir.constant(10 : i32) : i32 + omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 : !llvm.ptr) { +omp.parallel { + omp.distribute { +omp.wsloop { + omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step (%c1_2) { +omp.yield + } +} {omp.composite} + } {omp.composite} + omp.terminator +} {omp.composite} +omp.terminator + } + omp.terminator +} +llvm.return + } +} + +// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1, +// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1) + +// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} { +// CHECK: .omp.reduction.then: +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar() +// CHECK: br label %[[BODY_2ND_BB:.*]] + +// CHECK: [[BODY_2ND_BB]]: +// CHECK: call void @baz() +// CHECK: br label %[[CONT_BB:.*]] + +// CHECK: [[CONT_BB]]: +// CHECK: br label %.omp.reduction.done +// CHECK: } + +// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef %0, ptr noundef %1) #0 { +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: [[BODY_2ND_BB:.*]]: +// CHECK: call void @baz() +// CHECK: br label %omp.region.cont + + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call voi
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
llvmbot wrote: @llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) Changes Should not do anything. --- Patch is 85.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/158823.diff 10 Files Affected: - (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+5-3) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir (+115-115) - (modified) llvm/test/CodeGen/AMDGPU/inflate-reg-class-vgpr-mfma-to-av-with-load-source.mir (+6-6) - (modified) llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll (+12-12) - (modified) llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir (+4-4) - (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-insert-extract.mir (+6-6) - (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-subreg-src2-chain.mir (+14-14) - (modified) llvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll (+1-1) ``diff diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index 7eccaafefc893..4e1876db41d3d 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -1131,7 +1131,8 @@ def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2 let Size = 32; } -def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, SReg_64)> { +def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, +(add VReg_64, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; @@ -1139,7 +1140,7 @@ def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, SReg_6 } def VS_64_Align2 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, - (add VReg_64_Align2, SReg_64)> { + (add VReg_64_Align2, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; @@ -1153,7 +1154,8 @@ def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 32, (add VGPR_32, AGPR_3 let Size = 32; } -def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64_Lo256_Align2, SReg_64)> { +def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, + (add VReg_64_Lo256_Align2, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll index f9d11cb23fa4e..2cde060529bec 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll @@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 { define double @test_multiple_register_outputs_mixed() #0 { ; CHECK-LABEL: name: test_multiple_register_outputs_mixed ; CHECK: bb.1 (%ir-block.0): - ; CHECK-NEXT: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3670026 /* regdef:VReg_64 */, def %9 + ; CHECK-NEXT: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3735562 /* regdef:VReg_64 */, def %9 ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY %8 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY %9 ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64) diff --git a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir index 04cb0b14679bb..029aa3957d32b 100644 --- a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir +++ b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir @@ -20,13 +20,13 @@ body: | ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64 = COPY [[COPY]] ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64 = COPY [[COPY1]] -; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, [[COPY2]] +; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, [[COPY2]] ; CHECK-NEXT: SI_RETURN %0:vgpr_32 = COPY $vgpr0 %1:vgpr_32 = COPY $vgpr1 undef %2.sub0:areg_64 = COPY %0 %2.sub1:areg_64 = COPY %1 -INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, killed %2 +INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, killed %2 SI_RETURN ... @@ -45,13 +45,13 @@ body: | ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = C
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
https://github.com/arsenm commented: This does probably add inline asm support for this usage https://github.com/llvm/llvm-project/pull/158823 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/158823 Should not do anything. >From 2e363048d0f6ec969e6824bdaa062fee3907d853 Mon Sep 17 00:00:00 2001 From: Stanislav Mekhanoshin Date: Tue, 16 Sep 2025 00:28:29 -0700 Subject: [PATCH] [AMDGPU] Add aperture classes to VS_64 Should not do anything. --- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 8 +- .../GlobalISel/irtranslator-inline-asm.ll | 2 +- .../coalesce-copy-to-agpr-to-av-registers.mir | 230 +- ...class-vgpr-mfma-to-av-with-load-source.mir | 12 +- llvm/test/CodeGen/AMDGPU/inline-asm.i128.ll | 24 +- ...al-regcopy-and-spill-missed-at-regalloc.ll | 16 +- .../rewrite-vgpr-mfma-to-agpr-copy-from.mir | 8 +- ...gpr-mfma-to-agpr-subreg-insert-extract.mir | 12 +- ...te-vgpr-mfma-to-agpr-subreg-src2-chain.mir | 28 +-- .../CodeGen/AMDGPU/spill-vector-superclass.ll | 2 +- 10 files changed, 172 insertions(+), 170 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td index 7eccaafefc893..4e1876db41d3d 100644 --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td @@ -1131,7 +1131,8 @@ def VS_32_Lo256 : SIRegisterClass<"AMDGPU", [i32, f32, i16, f16, bf16, v2i16, v2 let Size = 32; } -def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, SReg_64)> { +def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, +(add VReg_64, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; @@ -1139,7 +1140,7 @@ def VS_64 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64, SReg_6 } def VS_64_Align2 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, - (add VReg_64_Align2, SReg_64)> { + (add VReg_64_Align2, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; @@ -1153,7 +1154,8 @@ def AV_32 : SIRegisterClass<"AMDGPU", VGPR_32.RegTypes, 32, (add VGPR_32, AGPR_3 let Size = 32; } -def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, (add VReg_64_Lo256_Align2, SReg_64)> { +def VS_64_Lo256 : SIRegisterClass<"AMDGPU", VReg_64.RegTypes, 32, + (add VReg_64_Lo256_Align2, SReg_64_Encodable)> { let isAllocatable = 0; let HasVGPR = 1; let HasSGPR = 1; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll index f9d11cb23fa4e..2cde060529bec 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll @@ -136,7 +136,7 @@ define float @test_multiple_register_outputs_same() #0 { define double @test_multiple_register_outputs_mixed() #0 { ; CHECK-LABEL: name: test_multiple_register_outputs_mixed ; CHECK: bb.1 (%ir-block.0): - ; CHECK-NEXT: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3670026 /* regdef:VReg_64 */, def %9 + ; CHECK-NEXT: INLINEASM &"v_mov_b32 $0, 0; v_add_f64 $1, 0, 0", 0 /* attdialect */, 2031626 /* regdef:VGPR_32 */, def %8, 3735562 /* regdef:VReg_64 */, def %9 ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY %8 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY %9 ; CHECK-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64) diff --git a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir index 04cb0b14679bb..029aa3957d32b 100644 --- a/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir +++ b/llvm/test/CodeGen/AMDGPU/coalesce-copy-to-agpr-to-av-registers.mir @@ -20,13 +20,13 @@ body: | ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64 = COPY [[COPY]] ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64 = COPY [[COPY1]] -; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, [[COPY2]] +; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, [[COPY2]] ; CHECK-NEXT: SI_RETURN %0:vgpr_32 = COPY $vgpr0 %1:vgpr_32 = COPY $vgpr1 undef %2.sub0:areg_64 = COPY %0 %2.sub1:areg_64 = COPY %1 -INLINEASM &"; use $0", 0 /* attdialect */, 4128777 /* reguse:AReg_64 */, killed %2 +INLINEASM &"; use $0", 0 /* attdialect */, 4325385 /* reguse:AReg_64 */, killed %2 SI_RETURN ... @@ -45,13 +45,13 @@ body: | ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1 ; CHECK-NEXT: undef [[COPY2:%[0-9]+]].sub0:areg_64_align2 = COPY [[COPY]] ; CHECK-NEXT: [[COPY2:%[0-9]+]].sub1:areg_64_align2 = COPY [[COPY1]] -; CHECK-NEXT: INLINEASM &"; use $0", 0 /* attdialect */, 4
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
rampitec wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#158823** https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158823?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#158754** https://app.graphite.dev/github/pr/llvm/llvm-project/158754?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#158725** https://app.graphite.dev/github/pr/llvm/llvm-project/158725?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/158823 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158777 >From 0821bf6b56fbcf9aebc2eea8b4e1af02f9f2d1f9 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 5 Sep 2025 18:03:59 +0900 Subject: [PATCH 1/2] PPC: Replace PointerLikeRegClass with RegClassByHwMode --- .../PowerPC/Disassembler/PPCDisassembler.cpp | 3 -- llvm/lib/Target/PowerPC/PPC.td| 6 llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | 28 ++- llvm/lib/Target/PowerPC/PPCRegisterInfo.td| 10 +-- 4 files changed, 23 insertions(+), 24 deletions(-) diff --git a/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp b/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp index 47586c417cfe3..70e619cc22b19 100644 --- a/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp +++ b/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp @@ -185,9 +185,6 @@ DecodeG8RC_NOX0RegisterClass(MCInst &Inst, uint64_t RegNo, uint64_t Address, return decodeRegisterClass(Inst, RegNo, XRegsNoX0); } -#define DecodePointerLikeRegClass0 DecodeGPRCRegisterClass -#define DecodePointerLikeRegClass1 DecodeGPRC_NOR0RegisterClass - static DecodeStatus DecodeSPERCRegisterClass(MCInst &Inst, uint64_t RegNo, uint64_t Address, const MCDisassembler *Decoder) { diff --git a/llvm/lib/Target/PowerPC/PPC.td b/llvm/lib/Target/PowerPC/PPC.td index 386d0f65d1ed1..d491e88b66ad8 100644 --- a/llvm/lib/Target/PowerPC/PPC.td +++ b/llvm/lib/Target/PowerPC/PPC.td @@ -394,6 +394,12 @@ def NotAIX : Predicate<"!Subtarget->isAIXABI()">; def IsISAFuture : Predicate<"Subtarget->isISAFuture()">; def IsNotISAFuture : Predicate<"!Subtarget->isISAFuture()">; +//===--===// +// HwModes +//===--===// + +defvar PPC32 = DefaultMode; +def PPC64 : HwMode<[In64BitMode]>; // Since new processors generally contain a superset of features of those that // came before them, the idea is to make implementations of new processors diff --git a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp index db066bc4b7bdd..55e38bcf4afc9 100644 --- a/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp +++ b/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp @@ -2142,33 +2142,23 @@ bool PPCInstrInfo::onlyFoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI, assert(UseIdx < UseMI.getNumOperands() && "Cannot find Reg in UseMI"); assert(UseIdx < UseMCID.getNumOperands() && "No operand description for Reg"); - const MCOperandInfo *UseInfo = &UseMCID.operands()[UseIdx]; - // We can fold the zero if this register requires a GPRC_NOR0/G8RC_NOX0 // register (which might also be specified as a pointer class kind). - if (UseInfo->isLookupPtrRegClass()) { -if (UseInfo->RegClass /* Kind */ != 1) - return false; - } else { -if (UseInfo->RegClass != PPC::GPRC_NOR0RegClassID && -UseInfo->RegClass != PPC::G8RC_NOX0RegClassID) - return false; - } + + const MCOperandInfo &UseInfo = UseMCID.operands()[UseIdx]; + int16_t RegClass = getOpRegClassID(UseInfo); + if (UseInfo.RegClass != PPC::GPRC_NOR0RegClassID && + UseInfo.RegClass != PPC::G8RC_NOX0RegClassID) +return false; // Make sure this is not tied to an output register (or otherwise // constrained). This is true for ST?UX registers, for example, which // are tied to their output registers. - if (UseInfo->Constraints != 0) + if (UseInfo.Constraints != 0) return false; - MCRegister ZeroReg; - if (UseInfo->isLookupPtrRegClass()) { -bool isPPC64 = Subtarget.isPPC64(); -ZeroReg = isPPC64 ? PPC::ZERO8 : PPC::ZERO; - } else { -ZeroReg = UseInfo->RegClass == PPC::G8RC_NOX0RegClassID ? - PPC::ZERO8 : PPC::ZERO; - } + MCRegister ZeroReg = + RegClass == PPC::G8RC_NOX0RegClassID ? PPC::ZERO8 : PPC::ZERO; LLVM_DEBUG(dbgs() << "Folded immediate zero for: "); LLVM_DEBUG(UseMI.dump()); diff --git a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td index 8b690b7b833b3..adda91786d19c 100644 --- a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td +++ b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td @@ -868,7 +868,11 @@ def crbitm: Operand { def PPCRegGxRCNoR0Operand : AsmOperandClass { let Name = "RegGxRCNoR0"; let PredicateMethod = "isRegNumber"; } -def ptr_rc_nor0 : Operand, PointerLikeRegClass<1> { + +def ptr_rc_nor0 : Operand, + RegClassByHwMode< +[PPC32, PPC64], +[GPRC_NOR0, G8RC_NOX0]> { let ParserMatchClass = PPCRegGxRCNoR0Operand; } @@ -902,7 +906,9 @@ def memri34_pcrel : Operand { // memri, imm is a 34-bit value. def PPCRegGxRCOperand : AsmOperandClass { let Name = "RegGxRC"; let PredicateMethod = "isRegNumber"; } -def ptr_rc_idx : Operand, PointerLikeRegClass<0> { +def ptr_rc_idx : Operand,
[llvm-branch-commits] [llvm] PPC: Replace PointerLikeRegClass with RegClassByHwMode (PR #158777)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/158777 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
https://github.com/rampitec ready_for_review https://github.com/llvm/llvm-project/pull/158823 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add aperture classes to VS_64 (PR #158823)
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/158823 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Fix codegen to emit COPY instead of S_MOV_B64 for aperture regs (PR #158754)
https://github.com/jayfoad approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/158754 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [PAC][Driver] Support ptrauth flags only on ARM64 Darwin or with pauthtest ABI (PR #113152)
https://github.com/kovdan01 updated https://github.com/llvm/llvm-project/pull/113152 >From 64489c9dd71e9ff5b0b05130e73b8e7d2ba1fde7 Mon Sep 17 00:00:00 2001 From: Daniil Kovalev Date: Mon, 21 Oct 2024 12:18:56 +0300 Subject: [PATCH 1/8] [PAC][Driver] Support ptrauth flags only on ARM64 Darwin Most ptrauth flags are ABI-affecting, so they should not be exposed to end users. Under certain conditions, some ptrauth driver flags are intended to be used for ARM64 Darwin, so allow them in this case. Leave `-faarch64-jump-table-hardening` available for all AArch64 targets since it's not ABI-affecting. --- clang/lib/Driver/ToolChains/Clang.cpp | 28 - clang/lib/Driver/ToolChains/Linux.cpp | 53 ++--- clang/test/Driver/aarch64-ptrauth.c | 164 -- 3 files changed, 135 insertions(+), 110 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp index f9e6031522134..08ee45856e5e1 100644 --- a/clang/lib/Driver/ToolChains/Clang.cpp +++ b/clang/lib/Driver/ToolChains/Clang.cpp @@ -1662,34 +1662,6 @@ void Clang::AddAArch64TargetArgs(const ArgList &Args, AddUnalignedAccessWarning(CmdArgs); - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_intrinsics, -options::OPT_fno_ptrauth_intrinsics); - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_calls, -options::OPT_fno_ptrauth_calls); - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_returns, -options::OPT_fno_ptrauth_returns); - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_auth_traps, -options::OPT_fno_ptrauth_auth_traps); - Args.addOptInFlag( - CmdArgs, options::OPT_fptrauth_vtable_pointer_address_discrimination, - options::OPT_fno_ptrauth_vtable_pointer_address_discrimination); - Args.addOptInFlag( - CmdArgs, options::OPT_fptrauth_vtable_pointer_type_discrimination, - options::OPT_fno_ptrauth_vtable_pointer_type_discrimination); - Args.addOptInFlag( - CmdArgs, options::OPT_fptrauth_type_info_vtable_pointer_discrimination, - options::OPT_fno_ptrauth_type_info_vtable_pointer_discrimination); - Args.addOptInFlag( - CmdArgs, options::OPT_fptrauth_function_pointer_type_discrimination, - options::OPT_fno_ptrauth_function_pointer_type_discrimination); - - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_indirect_gotos, -options::OPT_fno_ptrauth_indirect_gotos); - Args.addOptInFlag(CmdArgs, options::OPT_fptrauth_init_fini, -options::OPT_fno_ptrauth_init_fini); - Args.addOptInFlag(CmdArgs, -options::OPT_fptrauth_init_fini_address_discrimination, -options::OPT_fno_ptrauth_init_fini_address_discrimination); Args.addOptInFlag(CmdArgs, options::OPT_faarch64_jump_table_hardening, options::OPT_fno_aarch64_jump_table_hardening); diff --git a/clang/lib/Driver/ToolChains/Linux.cpp b/clang/lib/Driver/ToolChains/Linux.cpp index 04a8ad1d165d4..1e93b3aafbf47 100644 --- a/clang/lib/Driver/ToolChains/Linux.cpp +++ b/clang/lib/Driver/ToolChains/Linux.cpp @@ -484,49 +484,16 @@ std::string Linux::ComputeEffectiveClangTriple(const llvm::opt::ArgList &Args, // options represent the default signing schema. static void handlePAuthABI(const Driver &D, const ArgList &DriverArgs, ArgStringList &CC1Args) { - if (!DriverArgs.hasArg(options::OPT_fptrauth_intrinsics, - options::OPT_fno_ptrauth_intrinsics)) -CC1Args.push_back("-fptrauth-intrinsics"); - - if (!DriverArgs.hasArg(options::OPT_fptrauth_calls, - options::OPT_fno_ptrauth_calls)) -CC1Args.push_back("-fptrauth-calls"); - - if (!DriverArgs.hasArg(options::OPT_fptrauth_returns, - options::OPT_fno_ptrauth_returns)) -CC1Args.push_back("-fptrauth-returns"); - - if (!DriverArgs.hasArg(options::OPT_fptrauth_auth_traps, - options::OPT_fno_ptrauth_auth_traps)) -CC1Args.push_back("-fptrauth-auth-traps"); - - if (!DriverArgs.hasArg( - options::OPT_fptrauth_vtable_pointer_address_discrimination, - options::OPT_fno_ptrauth_vtable_pointer_address_discrimination)) -CC1Args.push_back("-fptrauth-vtable-pointer-address-discrimination"); - - if (!DriverArgs.hasArg( - options::OPT_fptrauth_vtable_pointer_type_discrimination, - options::OPT_fno_ptrauth_vtable_pointer_type_discrimination)) -CC1Args.push_back("-fptrauth-vtable-pointer-type-discrimination"); - - if (!DriverArgs.hasArg( - options::OPT_fptrauth_type_info_vtable_pointer_discrimination, - options::OPT_fno_ptrauth_type_info_vtable_pointer_discrimination)) -CC1Args.push_back("-fptrauth-type-info-vtable-pointer-discrimination"); - - if (!DriverArgs.hasArg(options::OPT_fptrauth_indirect_gotos, - options::OPT_fno_ptrauth_indirect_gotos)) -CC1Args.
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/157638 >From 509959568c433d7745ca1f5387edd7654b3e1c2a Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 05:54:00 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device Extends support for mapping `do concurrent` on the device by adding support for `local` specifiers. The changes in this PR map the local variable to the `omp.target` op and uses the mapped value as the `private` clause operand in the nested `omp.parallel` op. --- .../include/flang/Optimizer/Dialect/FIROps.td | 12 ++ .../OpenMP/DoConcurrentConversion.cpp | 192 +++--- .../Transforms/DoConcurrent/local_device.mlir | 49 + 3 files changed, 175 insertions(+), 78 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td index bc971e8fd6600..fc6eedc6ed4c6 100644 --- a/flang/include/flang/Optimizer/Dialect/FIROps.td +++ b/flang/include/flang/Optimizer/Dialect/FIROps.td @@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop", return getReduceVars().size(); } +unsigned getInductionVarsStart() { + return 0; +} + +unsigned getLocalOperandsStart() { + return getNumInductionVars(); +} + +unsigned getReduceOperandsStart() { + return getLocalOperandsStart() + getNumLocalOperands(); +} + mlir::Block::BlockArgListType getInductionVars() { return getBody()->getArguments().slice(0, getNumInductionVars()); } diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index 6c71924000842..d00a4fdd2cf2e 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, liveIns.push_back(operand->get()); }); + + for (mlir::Value local : loop.getLocalVars()) +liveIns.push_back(local); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -298,8 +301,7 @@ class DoConcurrentConversion .getIsTargetDevice(); mlir::omp::TargetOperands targetClauseOps; - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps, + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps, isTargetDevice ? nullptr : &targetClauseOps); LiveInShapeInfoMap liveInShapeInfoMap; @@ -321,14 +323,13 @@ class DoConcurrentConversion } mlir::omp::ParallelOp parallelOp = -genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper); +genParallelOp(rewriter, loop, ivInfos, mapper); // Only set as composite when part of `distribute parallel do`. parallelOp.setComposite(mapToDevice); if (!mapToDevice) - genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper, - loopNestClauseOps); + genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps); for (mlir::Value local : locals) looputils::localizeLoopLocalValue(local, parallelOp.getRegion(), @@ -337,10 +338,38 @@ class DoConcurrentConversion if (mapToDevice) genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true); -mlir::omp::LoopNestOp ompLoopNest = +auto [loopNestOp, wsLoopOp] = genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps, /*isComposite=*/mapToDevice); +// `local` region arguments are transferred/cloned from the `do concurrent` +// loop to the loopnest op when the region is cloned above. Instead, these +// region arguments should be on the workshare loop's region. +if (mapToDevice) { + for (auto [parallelArg, loopNestArg] : llvm::zip_equal( + parallelOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getLocalOperandsStart(), loop.getNumLocalOperands( +rewriter.replaceAllUsesWith(loopNestArg, parallelArg); + + for (auto [wsloopArg, loopNestArg] : llvm::zip_equal( + wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().slice( + loop.getReduceOperandsStart(), loop.getNumReduceOperands( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} else { + for (auto [wsloopArg, loopNestArg] : + llvm::zip_equal(wsLoopOp.getRegion().getArguments(), + loopNestOp.getRegion().getArguments().drop_front( + loopNestClauseOps.loopLowerBounds.size( +rewriter.replaceAllUsesWith(loopNestArg, wsloopArg); +} + +for (unsigned i = 0; + i
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156610 >From 5b9f17606b95f689a7ffb0187d103b2a4bd62e24 Mon Sep 17 00:00:00 2001 From: ergawy Date: Tue, 2 Sep 2025 08:36:34 -0500 Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device Extends `do concurrent` to OpenMP device mapping by adding support for mapping `reduce` specifiers to omp `reduction` clauses. The changes attach 2 `reduction` clauses to the mapped OpenMP construct: one on the `teams` part of the construct and one on the `wloop` part. --- .../OpenMP/DoConcurrentConversion.cpp | 117 ++ .../DoConcurrent/reduce_device.mlir | 53 2 files changed, 121 insertions(+), 49 deletions(-) create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp index d00a4fdd2cf2e..6e308499100fa 100644 --- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp +++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp @@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop, for (mlir::Value local : loop.getLocalVars()) liveIns.push_back(local); + + for (mlir::Value reduce : loop.getReduceVars()) +liveIns.push_back(reduce); } /// Collects values that are local to a loop: "loop-local values". A loop-local @@ -319,7 +322,7 @@ class DoConcurrentConversion targetOp = genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns, targetClauseOps, loopNestClauseOps, liveInShapeInfoMap); - genTeamsOp(doLoop.getLoc(), rewriter); + genTeamsOp(rewriter, loop, mapper); } mlir::omp::ParallelOp parallelOp = @@ -492,46 +495,7 @@ class DoConcurrentConversion if (!mapToDevice) genPrivatizers(rewriter, mapper, loop, wsloopClauseOps); -if (!loop.getReduceVars().empty()) { - for (auto [op, byRef, sym, arg] : llvm::zip_equal( - loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(), - loop.getReduceSymsAttr().getAsRange(), - loop.getRegionReduceArgs())) { -auto firReducer = moduleSymbolTable.lookup( -sym.getLeafReference()); - -mlir::OpBuilder::InsertionGuard guard(rewriter); -rewriter.setInsertionPointAfter(firReducer); -std::string ompReducerName = sym.getLeafReference().str() + ".omp"; - -auto ompReducer = -moduleSymbolTable.lookup( -rewriter.getStringAttr(ompReducerName)); - -if (!ompReducer) { - ompReducer = mlir::omp::DeclareReductionOp::create( - rewriter, firReducer.getLoc(), ompReducerName, - firReducer.getTypeAttr().getValue()); - - cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(), - ompReducer.getAllocRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(), - ompReducer.getInitializerRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(), - ompReducer.getReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(), - ompReducer.getAtomicReductionRegion()); - cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(), - ompReducer.getCleanupRegion()); - moduleSymbolTable.insert(ompReducer); -} - -wsloopClauseOps.reductionVars.push_back(op); -wsloopClauseOps.reductionByref.push_back(byRef); -wsloopClauseOps.reductionSyms.push_back( -mlir::SymbolRefAttr::get(ompReducer)); - } -} +genReductions(rewriter, mapper, loop, wsloopClauseOps); auto wsloopOp = mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps); @@ -553,8 +517,6 @@ class DoConcurrentConversion rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back()); mlir::omp::YieldOp::create(rewriter, loop->getLoc()); -loop->getParentOfType().print( -llvm::errs(), mlir::OpPrintingFlags().assumeVerified()); return {loopNestOp, wsloopOp}; } @@ -778,15 +740,26 @@ class DoConcurrentConversion liveInName, shape); } - mlir::omp::TeamsOp - genTeamsOp(mlir::Location loc, - mlir::ConversionPatternRewriter &rewriter) const { -auto teamsOp = rewriter.create( -loc, /*clauses=*/mlir::omp::TeamsOperands{}); + mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter, +fir::DoConcurrentLoopOp loop, +mlir::IRMapping &mapper) const { +mlir::omp::TeamsOperands teamsOps; +genReductions(rewriter, mapper, loop, teamsOps); + +mlir::Location loc = loop.getLoc(); +aut
[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)
https://github.com/fhahn milestoned https://github.com/llvm/llvm-project/pull/158918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/158918 MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov to check for overflow. (cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db) >From 89c5e7e99f08f6f79aafa2ab91b0e224194f95b6 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Tue, 2 Sep 2025 09:37:19 +0100 Subject: [PATCH] [Loads] Check for overflow when adding MaxPtrDiff + Offset. MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov to check for overflow. (cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db) --- llvm/lib/Analysis/Loads.cpp | 5 +- .../LoopVectorize/load-deref-pred-align.ll| 130 ++ 2 files changed, 134 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp index 393f2648de3c9..fcc2cf2f7e8e7 100644 --- a/llvm/lib/Analysis/Loads.cpp +++ b/llvm/lib/Analysis/Loads.cpp @@ -382,7 +382,10 @@ bool llvm::isDereferenceableAndAlignedInLoop( if (Offset->getAPInt().urem(Alignment.value()) != 0) return false; -AccessSize = MaxPtrDiff + Offset->getAPInt(); +bool Overflow = false; +AccessSize = MaxPtrDiff.uadd_ov(Offset->getAPInt(), Overflow); +if (Overflow) + return false; AccessSizeSCEV = SE.getAddExpr(PtrDiff, Offset); Base = NewBase->getValue(); } else diff --git a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll index 8a326c9d0c083..7c2c3883e1dc7 100644 --- a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll +++ b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll @@ -753,3 +753,133 @@ exit: call void @llvm.memcpy.p0.p0.i64(ptr %dest, ptr %local_dest, i64 1024, i1 false) ret void } + +define void @adding_offset_overflows(i32 %n, ptr %A) { +; CHECK-LABEL: @adding_offset_overflows( +; CHECK-NEXT: entry: +; CHECK-NEXT:[[B:%.*]] = alloca [62 x i32], align 4 +; CHECK-NEXT:[[C:%.*]] = alloca [144 x i32], align 4 +; CHECK-NEXT:call void @init(ptr [[B]]) +; CHECK-NEXT:call void @init(ptr [[C]]) +; CHECK-NEXT:[[PRE:%.*]] = icmp slt i32 [[N:%.*]], 1 +; CHECK-NEXT:br i1 [[PRE]], label [[EXIT:%.*]], label [[PH:%.*]] +; CHECK: ph: +; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 +; CHECK-NEXT:[[TMP0:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1 +; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 2 +; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 2 +; CHECK-NEXT:[[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]] +; CHECK-NEXT:[[TMP1:%.*]] = add i64 1, [[N_VEC]] +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE3:%.*]] ] +; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]] +; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, ptr [[A:%.*]], i64 [[OFFSET_IDX]] +; CHECK-NEXT:[[TMP23:%.*]] = getelementptr i32, ptr [[TMP2]], i32 0 +; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP23]], align 4 +; CHECK-NEXT:[[TMP3:%.*]] = icmp ne <2 x i32> [[WIDE_LOAD]], zeroinitializer +; CHECK-NEXT:[[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0 +; CHECK-NEXT:br i1 [[TMP4]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] +; CHECK: pred.load.if: +; CHECK-NEXT:[[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0 +; CHECK-NEXT:[[TMP16:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP15]] +; CHECK-NEXT:[[TMP17:%.*]] = load i32, ptr [[TMP16]], align 4 +; CHECK-NEXT:[[TMP18:%.*]] = insertelement <2 x i32> poison, i32 [[TMP17]], i32 0 +; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE]] +; CHECK: pred.load.continue: +; CHECK-NEXT:[[TMP19:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP18]], [[PRED_LOAD_IF]] ] +; CHECK-NEXT:[[TMP20:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1 +; CHECK-NEXT:br i1 [[TMP20]], label [[PRED_LOAD_IF1:%.*]], label [[PRED_LOAD_CONTINUE2:%.*]] +; CHECK: pred.load.if1: +; CHECK-NEXT:[[TMP21:%.*]] = add i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT:[[TMP22:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP21]] +; CHECK-NEXT:[[TMP13:%.*]] = load i32, ptr [[TMP22]], align 4 +; CHECK-NEXT:[[TMP14:%.*]] = insertelement <2 x i32> [[TMP19]], i32 [[TMP13]], i32 1 +; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE2]] +; CHECK: pred.load.continue2: +; CHECK-NEXT:[[WIDE_LOAD1:%.*]] = phi <2 x i32> [ [[TMP19]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP14]], [[PRED_LOAD_IF1]] ] +; CHECK-NEXT:[[TMP5:%.*]] = sext <2 x i32> [[WIDE_LOAD1]] to <2 x i64> +; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0 +; CHECK-NEXT:br i1 [[TMP6]], la
[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/158918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)
llvmbot wrote: @llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) Changes MaxPtrDiff + Offset may wrap, leading to incorrect results. Use uadd_ov to check for overflow. (cherry picked from commit cf444ac2adc45c1079856087b8ba9a04466f78db) --- Full diff: https://github.com/llvm/llvm-project/pull/158918.diff 2 Files Affected: - (modified) llvm/lib/Analysis/Loads.cpp (+4-1) - (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+130) ``diff diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp index 393f2648de3c9..fcc2cf2f7e8e7 100644 --- a/llvm/lib/Analysis/Loads.cpp +++ b/llvm/lib/Analysis/Loads.cpp @@ -382,7 +382,10 @@ bool llvm::isDereferenceableAndAlignedInLoop( if (Offset->getAPInt().urem(Alignment.value()) != 0) return false; -AccessSize = MaxPtrDiff + Offset->getAPInt(); +bool Overflow = false; +AccessSize = MaxPtrDiff.uadd_ov(Offset->getAPInt(), Overflow); +if (Overflow) + return false; AccessSizeSCEV = SE.getAddExpr(PtrDiff, Offset); Base = NewBase->getValue(); } else diff --git a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll index 8a326c9d0c083..7c2c3883e1dc7 100644 --- a/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll +++ b/llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll @@ -753,3 +753,133 @@ exit: call void @llvm.memcpy.p0.p0.i64(ptr %dest, ptr %local_dest, i64 1024, i1 false) ret void } + +define void @adding_offset_overflows(i32 %n, ptr %A) { +; CHECK-LABEL: @adding_offset_overflows( +; CHECK-NEXT: entry: +; CHECK-NEXT:[[B:%.*]] = alloca [62 x i32], align 4 +; CHECK-NEXT:[[C:%.*]] = alloca [144 x i32], align 4 +; CHECK-NEXT:call void @init(ptr [[B]]) +; CHECK-NEXT:call void @init(ptr [[C]]) +; CHECK-NEXT:[[PRE:%.*]] = icmp slt i32 [[N:%.*]], 1 +; CHECK-NEXT:br i1 [[PRE]], label [[EXIT:%.*]], label [[PH:%.*]] +; CHECK: ph: +; CHECK-NEXT:[[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64 +; CHECK-NEXT:[[TMP0:%.*]] = add nsw i64 [[WIDE_TRIP_COUNT]], -1 +; CHECK-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 2 +; CHECK-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] +; CHECK: vector.ph: +; CHECK-NEXT:[[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 2 +; CHECK-NEXT:[[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]] +; CHECK-NEXT:[[TMP1:%.*]] = add i64 1, [[N_VEC]] +; CHECK-NEXT:br label [[VECTOR_BODY:%.*]] +; CHECK: vector.body: +; CHECK-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[PRED_STORE_CONTINUE3:%.*]] ] +; CHECK-NEXT:[[OFFSET_IDX:%.*]] = add i64 1, [[INDEX]] +; CHECK-NEXT:[[TMP2:%.*]] = getelementptr i32, ptr [[A:%.*]], i64 [[OFFSET_IDX]] +; CHECK-NEXT:[[TMP23:%.*]] = getelementptr i32, ptr [[TMP2]], i32 0 +; CHECK-NEXT:[[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP23]], align 4 +; CHECK-NEXT:[[TMP3:%.*]] = icmp ne <2 x i32> [[WIDE_LOAD]], zeroinitializer +; CHECK-NEXT:[[TMP4:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0 +; CHECK-NEXT:br i1 [[TMP4]], label [[PRED_LOAD_IF:%.*]], label [[PRED_LOAD_CONTINUE:%.*]] +; CHECK: pred.load.if: +; CHECK-NEXT:[[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0 +; CHECK-NEXT:[[TMP16:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP15]] +; CHECK-NEXT:[[TMP17:%.*]] = load i32, ptr [[TMP16]], align 4 +; CHECK-NEXT:[[TMP18:%.*]] = insertelement <2 x i32> poison, i32 [[TMP17]], i32 0 +; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE]] +; CHECK: pred.load.continue: +; CHECK-NEXT:[[TMP19:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP18]], [[PRED_LOAD_IF]] ] +; CHECK-NEXT:[[TMP20:%.*]] = extractelement <2 x i1> [[TMP3]], i32 1 +; CHECK-NEXT:br i1 [[TMP20]], label [[PRED_LOAD_IF1:%.*]], label [[PRED_LOAD_CONTINUE2:%.*]] +; CHECK: pred.load.if1: +; CHECK-NEXT:[[TMP21:%.*]] = add i64 [[OFFSET_IDX]], 1 +; CHECK-NEXT:[[TMP22:%.*]] = getelementptr i32, ptr [[B]], i64 [[TMP21]] +; CHECK-NEXT:[[TMP13:%.*]] = load i32, ptr [[TMP22]], align 4 +; CHECK-NEXT:[[TMP14:%.*]] = insertelement <2 x i32> [[TMP19]], i32 [[TMP13]], i32 1 +; CHECK-NEXT:br label [[PRED_LOAD_CONTINUE2]] +; CHECK: pred.load.continue2: +; CHECK-NEXT:[[WIDE_LOAD1:%.*]] = phi <2 x i32> [ [[TMP19]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP14]], [[PRED_LOAD_IF1]] ] +; CHECK-NEXT:[[TMP5:%.*]] = sext <2 x i32> [[WIDE_LOAD1]] to <2 x i64> +; CHECK-NEXT:[[TMP6:%.*]] = extractelement <2 x i1> [[TMP3]], i32 0 +; CHECK-NEXT:br i1 [[TMP6]], label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE:%.*]] +; CHECK: pred.store.if: +; CHECK-NEXT:[[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0 +; CHECK-NEXT:[[TMP8:%.*]] = getelementptr i32, ptr [[C]], i64 [[TMP7]] +; CHECK-NEXT:store i32 0, ptr [[TMP8]], align 4 +; CHECK-NEX
[llvm-branch-commits] [llvm] release/21.x: [Loads] Check for overflow when adding MaxPtrDiff + Offset. (PR #158918)
fhahn wrote: This fixes a mis-compile when bootstrapping Clang with sanitizers on macOS https://github.com/llvm/llvm-project/pull/158918 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [mlir] [MLIR] Add new complex.powi op (PR #158722)
https://github.com/tblah commented: LGTM once the existing comments are addressed. https://github.com/llvm/llvm-project/pull/158722 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated https://github.com/llvm/llvm-project/pull/156837 >From ccf3696848367835c15e973c7a7b0d76297be31c Mon Sep 17 00:00:00 2001 From: ergawy Date: Thu, 4 Sep 2025 01:06:21 -0500 Subject: [PATCH 1/2] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU Fixes a bug related to insertion points when inlining multi-block combiner reduction regions. The IP at the end of the inlined region was not used resulting in emitting BBs with multiple terminators. --- llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 + .../omptarget-multi-block-reduction.mlir | 85 +++ 2 files changed, 88 insertions(+) create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp index d1f78c32596ba..f4acb60a99bf0 100644 --- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp +++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp @@ -3507,6 +3507,8 @@ Expected OpenMPIRBuilder::createReductionFunction( return AfterIP.takeError(); if (!Builder.GetInsertBlock()) return ReductionFunc; + + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHSPtr); } } @@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); Builder.CreateStore(Reduced, LHS, false); } } diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir new file mode 100644 index 0..aaf06d2d0e0c2 --- /dev/null +++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir @@ -0,0 +1,85 @@ +// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s + +// Verifies that the IR builder can handle reductions with multi-block combiner +// regions on the GPU. + +module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5 : ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple = "amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} { + llvm.func @bar() {} + llvm.func @baz() {} + + omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc { +%0 = llvm.mlir.constant(1 : i64) : i64 +%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)> : (i64) -> !llvm.ptr<5> +%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr +omp.yield(%2 : !llvm.ptr) + } init { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +omp.yield(%arg1 : !llvm.ptr) + } combiner { + ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr): +llvm.call @bar() : () -> () +llvm.br ^bb3 + + ^bb3: // pred: ^bb1 +llvm.call @baz() : () -> () +omp.yield(%arg0 : !llvm.ptr) + } + llvm.func @foo_() { +%c1 = llvm.mlir.constant(1 : i64) : i64 +%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) -> !llvm.ptr<5> +%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr +%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"} +omp.target map_entries(%74 -> %arg0 : !llvm.ptr) { + %c1_2 = llvm.mlir.constant(1 : i32) : i32 + %c10 = llvm.mlir.constant(10 : i32) : i32 + omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2 : !llvm.ptr) { +omp.parallel { + omp.distribute { +omp.wsloop { + omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step (%c1_2) { +omp.yield + } +} {omp.composite} + } {omp.composite} + omp.terminator +} {omp.composite} +omp.terminator + } + omp.terminator +} +llvm.return + } +} + +// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1, +// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1) + +// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} { +// CHECK: .omp.reduction.then: +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @bar() +// CHECK: br label %[[BODY_2ND_BB:.*]] + +// CHECK: [[BODY_2ND_BB]]: +// CHECK: call void @baz() +// CHECK: br label %[[CONT_BB:.*]] + +// CHECK: [[CONT_BB]]: +// CHECK: br label %.omp.reduction.done +// CHECK: } + +// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef %0, ptr noundef %1) #0 { +// CHECK: br label %omp.reduction.nonatomic.body + +// CHECK: [[BODY_2ND_BB:.*]]: +// CHECK: call void @baz() +// CHECK: br label %omp.region.cont + + +// CHECK: omp.reduction.nonatomic.body: +// CHECK: call void @b
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
abidh wrote: Thanks for handling my comments. It looks good to me but I have one question. This patch sets the insertion point so that store instruction gets generated at the correct place. But the test does not have any store instruction. I was just wondering if the test is checking the right thing. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [RISCV] Reduce RISCV code generation build time (PR #158164)
compnerd wrote: > I do not know what this error means or how to fix it: > > ``` > error: Expected version 21.1.2 but found version 21.1.1 > ``` This just needs to be updated in CMakeLists.txt https://github.com/llvm/llvm-project/pull/158164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CodeGen][CFI] Generalize transparent union in args of args of functions (PR #158194)
https://github.com/vitalybuka converted_to_draft https://github.com/llvm/llvm-project/pull/158194 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopUnroll] Fix block frequencies for epilogue (PR #159163)
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/159163 >From 5a9959313c0aebc1c707d19e30055cb925be7760 Mon Sep 17 00:00:00 2001 From: "Joel E. Denny" Date: Tue, 16 Sep 2025 16:03:11 -0400 Subject: [PATCH 1/2] [LoopUnroll] Fix block frequencies for epilogue As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch removes the XFAIL directives that PR #157754 added to the test suite. --- .../include/llvm/Transforms/Utils/LoopUtils.h | 32 .../llvm/Transforms/Utils/UnrollLoop.h| 4 +- llvm/lib/Transforms/Utils/LoopUnroll.cpp | 31 ++-- .../Transforms/Utils/LoopUnrollRuntime.cpp| 94 -- llvm/lib/Transforms/Utils/LoopUtils.cpp | 48 ++ .../branch-weights-freq/unroll-epilog.ll | 160 ++ .../runtime-exit-phi-scev-invalidation.ll | 4 +- .../LoopUnroll/runtime-loop-branchweight.ll | 56 +- .../Transforms/LoopUnroll/runtime-loop.ll | 9 +- .../LoopUnroll/unroll-heuristics-pgo.ll | 64 +-- 10 files changed, 448 insertions(+), 54 deletions(-) create mode 100644 llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h index c5dbb2bdd1dd8..71754b8f62a16 100644 --- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h +++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h @@ -365,6 +365,38 @@ LLVM_ABI bool setLoopEstimatedTripCount( Loop *L, unsigned EstimatedTripCount, std::optional EstimatedLoopInvocationWeight = std::nullopt); +/// Based on branch weight metadata, return either: +/// - \c std::nullopt if the implementation is unable to handle the loop form +/// of \p L (e.g., \p L must have a latch block that controls the loop exit). +/// - Else, the estimated probability that, at the end of any iteration, the +/// latch of \p L will start another iteration. The result \c P is such that +/// `0 <= P <= 1`, and `1 - P` is the probability of exiting the loop. +std::optional getLoopProbability(Loop *L); + +/// Set branch weight metadata for the latch of \p L to indicate that, at the +/// end of any iteration, its estimated probability of starting another +/// iteration is \p P. Return false if the implementation is unable to handle +/// the loop form of \p L (e.g., \p L must have a latch block that controls the +/// loop exit). Otherwise, return true. +bool setLoopProbability(Loop *L, double P); + +/// Based on branch weight metadata, return either: +/// - \c std::nullopt if the implementation cannot extract the probability +/// (e.g., \p B must have exactly two target labels, so it must be a +/// conditional branch). +/// - The probability \c P that control flows from \p B to its first target +/// label such that `1 - P` is the probability of control flowing to its +/// second target label, or vice-versa if \p ForFirstTarget is false. +std::optional getBranchProbability(BranchInst *B, bool ForFirstTarget); + +/// Set branch weight metadata for \p B to indicate that \p P and `1 - P` are +/// the probabilities of control flowing to its first and second target labels, +/// respectively, or vice-versa if \p ForFirstTarget is false. Return false if +/// the implementation cannot set the probability (e.g., \p B must have exactly +/// two target labels, so it must be a conditional branch). Otherwise, return +/// true. +bool setBranchProbability(BranchInst *B, double P, bool ForFirstTarget); + /// Check inner loop (L) backedge count is known to be invariant on all /// iterations of its outer loop. If the loop has no parent, this is trivially /// true. diff --git a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h index 871c13d972470..571a0af6fd0db 100644 --- a/llvm/include/llvm/Transforms/Utils/UnrollLoop.h +++ b/llvm/include/llvm/Transforms/Utils/UnrollLoop.h @@ -97,7 +97,9 @@ LLVM_ABI bool UnrollRuntimeLoopRemainder( LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC, const TargetTransformInfo *TTI, bool PreserveLCSSA, unsigned SCEVExpansionBudget, bool RuntimeUnrollMultiExit, -Loop **ResultLoop = nullptr); +Loop **ResultLoop = nullptr, +std::optional OriginalTripCount = std::nullopt,
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
fhahn wrote: Sounds good to me! https://github.com/llvm/llvm-project/pull/156715 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [HLSL] Use static create methods to initialize resources in arrays (PR #157005)
https://github.com/llvm-beanz approved this pull request. https://github.com/llvm/llvm-project/pull/157005 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Stop using aligned VGPR classes for addRegisterClass (PR #158278)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158278 >From f6208fe1d18e2406ca9b6e84adbb35051b6ce94d Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 12 Sep 2025 20:45:56 +0900 Subject: [PATCH] AMDGPU: Stop using aligned VGPR classes for addRegisterClass This is unnecessary. At use emission time, InstrEmitter will use the common subclass of the value type's register class and the use instruction register classes. This removes one of the obstacles to treating special case instructions that do not have the alignment requirement overly conservatively. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 32 +++ llvm/test/CodeGen/AMDGPU/mfma-loop.ll | 14 +- 2 files changed, 24 insertions(+), 22 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 6a4df5eeb9779..4369b40e65103 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -111,52 +111,52 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, addRegisterClass(MVT::Untyped, V64RegClass); addRegisterClass(MVT::v3i32, &AMDGPU::SGPR_96RegClass); - addRegisterClass(MVT::v3f32, TRI->getVGPRClassForBitWidth(96)); + addRegisterClass(MVT::v3f32, &AMDGPU::VReg_96RegClass); addRegisterClass(MVT::v2i64, &AMDGPU::SGPR_128RegClass); addRegisterClass(MVT::v2f64, &AMDGPU::SGPR_128RegClass); addRegisterClass(MVT::v4i32, &AMDGPU::SGPR_128RegClass); - addRegisterClass(MVT::v4f32, TRI->getVGPRClassForBitWidth(128)); + addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass); addRegisterClass(MVT::v5i32, &AMDGPU::SGPR_160RegClass); - addRegisterClass(MVT::v5f32, TRI->getVGPRClassForBitWidth(160)); + addRegisterClass(MVT::v5f32, &AMDGPU::VReg_160RegClass); addRegisterClass(MVT::v6i32, &AMDGPU::SGPR_192RegClass); - addRegisterClass(MVT::v6f32, TRI->getVGPRClassForBitWidth(192)); + addRegisterClass(MVT::v6f32, &AMDGPU::VReg_192RegClass); addRegisterClass(MVT::v3i64, &AMDGPU::SGPR_192RegClass); - addRegisterClass(MVT::v3f64, TRI->getVGPRClassForBitWidth(192)); + addRegisterClass(MVT::v3f64, &AMDGPU::VReg_192RegClass); addRegisterClass(MVT::v7i32, &AMDGPU::SGPR_224RegClass); - addRegisterClass(MVT::v7f32, TRI->getVGPRClassForBitWidth(224)); + addRegisterClass(MVT::v7f32, &AMDGPU::VReg_224RegClass); addRegisterClass(MVT::v8i32, &AMDGPU::SGPR_256RegClass); - addRegisterClass(MVT::v8f32, TRI->getVGPRClassForBitWidth(256)); + addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass); addRegisterClass(MVT::v4i64, &AMDGPU::SGPR_256RegClass); - addRegisterClass(MVT::v4f64, TRI->getVGPRClassForBitWidth(256)); + addRegisterClass(MVT::v4f64, &AMDGPU::VReg_256RegClass); addRegisterClass(MVT::v9i32, &AMDGPU::SGPR_288RegClass); - addRegisterClass(MVT::v9f32, TRI->getVGPRClassForBitWidth(288)); + addRegisterClass(MVT::v9f32, &AMDGPU::VReg_288RegClass); addRegisterClass(MVT::v10i32, &AMDGPU::SGPR_320RegClass); - addRegisterClass(MVT::v10f32, TRI->getVGPRClassForBitWidth(320)); + addRegisterClass(MVT::v10f32, &AMDGPU::VReg_320RegClass); addRegisterClass(MVT::v11i32, &AMDGPU::SGPR_352RegClass); - addRegisterClass(MVT::v11f32, TRI->getVGPRClassForBitWidth(352)); + addRegisterClass(MVT::v11f32, &AMDGPU::VReg_352RegClass); addRegisterClass(MVT::v12i32, &AMDGPU::SGPR_384RegClass); - addRegisterClass(MVT::v12f32, TRI->getVGPRClassForBitWidth(384)); + addRegisterClass(MVT::v12f32, &AMDGPU::VReg_384RegClass); addRegisterClass(MVT::v16i32, &AMDGPU::SGPR_512RegClass); - addRegisterClass(MVT::v16f32, TRI->getVGPRClassForBitWidth(512)); + addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass); addRegisterClass(MVT::v8i64, &AMDGPU::SGPR_512RegClass); - addRegisterClass(MVT::v8f64, TRI->getVGPRClassForBitWidth(512)); + addRegisterClass(MVT::v8f64, &AMDGPU::VReg_512RegClass); addRegisterClass(MVT::v16i64, &AMDGPU::SGPR_1024RegClass); - addRegisterClass(MVT::v16f64, TRI->getVGPRClassForBitWidth(1024)); + addRegisterClass(MVT::v16f64, &AMDGPU::VReg_1024RegClass); if (Subtarget->has16BitInsts()) { if (Subtarget->useRealTrue16Insts()) { @@ -188,7 +188,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, } addRegisterClass(MVT::v32i32, &AMDGPU::VReg_1024RegClass); - addRegisterClass(MVT::v32f32, TRI->getVGPRClassForBitWidth(1024)); + addRegisterClass(MVT::v32f32, &AMDGPU::VReg_1024RegClass); computeRegisterProperties(Subtarget->getRegisterInfo()); diff --git a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll index d39daaade677f..3657a6b1b7415 100644 --- a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll +++ b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll @@ -2430,8 +2430,9 @@ define amdgpu_kernel void @test_mfma_nested_loop_zeroinit(ptr addrspace(1) %arg) ; GFX90A-NEXT:v_accvgpr_mov_b32 a29, a0 ; GFX90A-NEXT:v_accvgpr_mov_b32 a30, a0 ; GFX90A-NEXT
[llvm-branch-commits] [llvm] AMDGPU: Stop using aligned VGPR classes for addRegisterClass (PR #158278)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/158278 >From f6208fe1d18e2406ca9b6e84adbb35051b6ce94d Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 12 Sep 2025 20:45:56 +0900 Subject: [PATCH] AMDGPU: Stop using aligned VGPR classes for addRegisterClass This is unnecessary. At use emission time, InstrEmitter will use the common subclass of the value type's register class and the use instruction register classes. This removes one of the obstacles to treating special case instructions that do not have the alignment requirement overly conservatively. --- llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 32 +++ llvm/test/CodeGen/AMDGPU/mfma-loop.ll | 14 +- 2 files changed, 24 insertions(+), 22 deletions(-) diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp index 6a4df5eeb9779..4369b40e65103 100644 --- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp +++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp @@ -111,52 +111,52 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, addRegisterClass(MVT::Untyped, V64RegClass); addRegisterClass(MVT::v3i32, &AMDGPU::SGPR_96RegClass); - addRegisterClass(MVT::v3f32, TRI->getVGPRClassForBitWidth(96)); + addRegisterClass(MVT::v3f32, &AMDGPU::VReg_96RegClass); addRegisterClass(MVT::v2i64, &AMDGPU::SGPR_128RegClass); addRegisterClass(MVT::v2f64, &AMDGPU::SGPR_128RegClass); addRegisterClass(MVT::v4i32, &AMDGPU::SGPR_128RegClass); - addRegisterClass(MVT::v4f32, TRI->getVGPRClassForBitWidth(128)); + addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass); addRegisterClass(MVT::v5i32, &AMDGPU::SGPR_160RegClass); - addRegisterClass(MVT::v5f32, TRI->getVGPRClassForBitWidth(160)); + addRegisterClass(MVT::v5f32, &AMDGPU::VReg_160RegClass); addRegisterClass(MVT::v6i32, &AMDGPU::SGPR_192RegClass); - addRegisterClass(MVT::v6f32, TRI->getVGPRClassForBitWidth(192)); + addRegisterClass(MVT::v6f32, &AMDGPU::VReg_192RegClass); addRegisterClass(MVT::v3i64, &AMDGPU::SGPR_192RegClass); - addRegisterClass(MVT::v3f64, TRI->getVGPRClassForBitWidth(192)); + addRegisterClass(MVT::v3f64, &AMDGPU::VReg_192RegClass); addRegisterClass(MVT::v7i32, &AMDGPU::SGPR_224RegClass); - addRegisterClass(MVT::v7f32, TRI->getVGPRClassForBitWidth(224)); + addRegisterClass(MVT::v7f32, &AMDGPU::VReg_224RegClass); addRegisterClass(MVT::v8i32, &AMDGPU::SGPR_256RegClass); - addRegisterClass(MVT::v8f32, TRI->getVGPRClassForBitWidth(256)); + addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass); addRegisterClass(MVT::v4i64, &AMDGPU::SGPR_256RegClass); - addRegisterClass(MVT::v4f64, TRI->getVGPRClassForBitWidth(256)); + addRegisterClass(MVT::v4f64, &AMDGPU::VReg_256RegClass); addRegisterClass(MVT::v9i32, &AMDGPU::SGPR_288RegClass); - addRegisterClass(MVT::v9f32, TRI->getVGPRClassForBitWidth(288)); + addRegisterClass(MVT::v9f32, &AMDGPU::VReg_288RegClass); addRegisterClass(MVT::v10i32, &AMDGPU::SGPR_320RegClass); - addRegisterClass(MVT::v10f32, TRI->getVGPRClassForBitWidth(320)); + addRegisterClass(MVT::v10f32, &AMDGPU::VReg_320RegClass); addRegisterClass(MVT::v11i32, &AMDGPU::SGPR_352RegClass); - addRegisterClass(MVT::v11f32, TRI->getVGPRClassForBitWidth(352)); + addRegisterClass(MVT::v11f32, &AMDGPU::VReg_352RegClass); addRegisterClass(MVT::v12i32, &AMDGPU::SGPR_384RegClass); - addRegisterClass(MVT::v12f32, TRI->getVGPRClassForBitWidth(384)); + addRegisterClass(MVT::v12f32, &AMDGPU::VReg_384RegClass); addRegisterClass(MVT::v16i32, &AMDGPU::SGPR_512RegClass); - addRegisterClass(MVT::v16f32, TRI->getVGPRClassForBitWidth(512)); + addRegisterClass(MVT::v16f32, &AMDGPU::VReg_512RegClass); addRegisterClass(MVT::v8i64, &AMDGPU::SGPR_512RegClass); - addRegisterClass(MVT::v8f64, TRI->getVGPRClassForBitWidth(512)); + addRegisterClass(MVT::v8f64, &AMDGPU::VReg_512RegClass); addRegisterClass(MVT::v16i64, &AMDGPU::SGPR_1024RegClass); - addRegisterClass(MVT::v16f64, TRI->getVGPRClassForBitWidth(1024)); + addRegisterClass(MVT::v16f64, &AMDGPU::VReg_1024RegClass); if (Subtarget->has16BitInsts()) { if (Subtarget->useRealTrue16Insts()) { @@ -188,7 +188,7 @@ SITargetLowering::SITargetLowering(const TargetMachine &TM, } addRegisterClass(MVT::v32i32, &AMDGPU::VReg_1024RegClass); - addRegisterClass(MVT::v32f32, TRI->getVGPRClassForBitWidth(1024)); + addRegisterClass(MVT::v32f32, &AMDGPU::VReg_1024RegClass); computeRegisterProperties(Subtarget->getRegisterInfo()); diff --git a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll index d39daaade677f..3657a6b1b7415 100644 --- a/llvm/test/CodeGen/AMDGPU/mfma-loop.ll +++ b/llvm/test/CodeGen/AMDGPU/mfma-loop.ll @@ -2430,8 +2430,9 @@ define amdgpu_kernel void @test_mfma_nested_loop_zeroinit(ptr addrspace(1) %arg) ; GFX90A-NEXT:v_accvgpr_mov_b32 a29, a0 ; GFX90A-NEXT:v_accvgpr_mov_b32 a30, a0 ; GFX90A-NEXT
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
ergawy wrote: > Thanks for handling my comments. It looks good to me but I have one question. > This patch sets the insertion point so that store instruction gets generated > at the correct place. But the test does not check for any store instruction. > I was just wondering if the test is checking the right thing. Without the changes in the PR, the test crashes flang. However, I agree that the test should be expanded a bit. Added more checks to capture better the code-gen of the reduction. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add test where ExactSIV misses dependency due to overflow (NFC) (PR #157085)
@@ -807,3 +807,123 @@ for.body: ; preds = %entry, %for.body for.end: ; preds = %for.body ret void } + +;; max_i = INT64_MAX/6 // 1537228672809129301 +;; for (long long i = 0; i <= max_i; i++) { +;; A[-6*i + INT64_MAX] = 0; +;; if (i) +;; A[3*i - 2] = 1; +;; } +;; +;; FIXME: There is a loop-carried dependency between +;; `A[-6*i + INT64_MAX]` and `A[3*i - 2]`. For example, kasuga-fj wrote: Thanks, fixed https://github.com/llvm/llvm-project/pull/157085 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits