[llvm-branch-commits] [llvm] [AMDGPU] efficiently wait for direct loads to LDS at all scopes (PR #147258)

2025-07-09 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: Note that the best way to see the effect of this PR is to view only the second diff of the two in this PR. It shows how the missing vmcnt(0) shows up in the new test introduced by the first commit. https://github.com/llvm/llvm-project/pull/147258 ___

[llvm-branch-commits] [llvm] [AMDGPU] efficiently wait for direct loads to LDS at all scopes (PR #147258)

2025-07-09 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra updated https://github.com/llvm/llvm-project/pull/147258 >From 95ffad8e0c22f261999f8a87abde8592c0596395 Mon Sep 17 00:00:00 2001 From: Sameer Sahasrabuddhe Date: Tue, 17 Jun 2025 13:11:55 +0530 Subject: [PATCH 1/2] [AMDGCN] pre-checkin test for LDS DMA and release o

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/147257 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1( ; GFX12-WGP-NEXT:s_wait_kmcnt 0x0 ; GFX12-WGP-NEXT:s_wait_storecnt 0x0 ; GFX12-WGP-NEXT:global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS +; GFX12-WGP-NEXT:s_wait_loadcnt 0x3f

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1( ; GFX12-WGP-NEXT:s_wait_kmcnt 0x0 ; GFX12-WGP-NEXT:s_wait_storecnt 0x0 ; GFX12-WGP-NEXT:global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS +; GFX12-WGP-NEXT:s_wait_loadcnt 0x3f

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1( ; GFX12-WGP-NEXT:s_wait_kmcnt 0x0 ; GFX12-WGP-NEXT:s_wait_storecnt 0x0 ; GFX12-WGP-NEXT:global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS +; GFX12-WGP-NEXT:s_wait_loadcnt 0x3f

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -669,6 +679,7 @@ define amdgpu_kernel void @global_volatile_store_1( ; GFX12-WGP-NEXT:s_wait_kmcnt 0x0 ; GFX12-WGP-NEXT:s_wait_storecnt 0x0 ; GFX12-WGP-NEXT:global_store_b32 v0, v1, s[0:1] scope:SCOPE_SYS +; GFX12-WGP-NEXT:s_wait_loadcnt 0x3f

[llvm-branch-commits] [llvm] [AMDGPU] efficiently wait for direct loads to LDS at all scopes (PR #147258)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: This is part of a stack: - #147258 - #147257 - #147256 https://github.com/llvm/llvm-project/pull/147258 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bran

[llvm-branch-commits] [llvm] [AMDGPU] always emit a soft wait even if it is trivially ~0 (PR #147257)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: This is part of a stack: - #147258 - #147257 - #147256 https://github.com/llvm/llvm-project/pull/147257 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bran

[llvm-branch-commits] [llvm] [AMDGPU] efficiently wait for direct loads to LDS at all scopes (PR #147258)

2025-07-07 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/147258 Currently, the memory legalizer does not generate any wait on vmcnt at workgroup scope. This is incorrect because direct loads to LDS are tracked using vmcnt and they need to be released properly at workgroup sc

[llvm-branch-commits] [clang] [llvm] [clang] Redefine `noconvergent` and generate convergence control tokens (PR #136282)

2025-04-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/136282 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [clang] Redefine `noconvergent` and generate convergence control tokens (PR #136282)

2025-04-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/136282 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [clang] Redefine `noconvergent` and generate convergence control tokens (PR #136282)

2025-04-18 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/136282 This introduces the `-fconvergence-control` flag that emits convergence control intrinsics which are then used as the `convergencectrl` operand bundle on convergent calls. This also redefines the `noconvergen

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-02-04 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra commented: The changes to UA look good to me. I can't comment much about the actual patch itself. https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org http

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-30 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -342,6 +342,10 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + // Use outside cycle with divergent exit + using UOCWDE = -

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-30 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -188,6 +190,37 @@ void DivergenceLoweringHelper::constrainAsLaneMask(Incoming &In) { In.Reg = Copy.getReg(0); } +void replaceUsesOfRegInInstWith(Register Reg, MachineInstr *Inst, +Register NewReg) { + for (MachineOperand &Op : Inst->opera

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-29 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -342,6 +342,10 @@ template class GenericUniformityAnalysisImpl { typename SyncDependenceAnalysisT::DivergenceDescriptor; using BlockLabelMapT = typename SyncDependenceAnalysisT::BlockLabelMap; + // Use outside cycle with divergent exit + using UOCWDE = -

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-29 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -1210,6 +1240,13 @@ void GenericUniformityAnalysisImpl::print(raw_ostream &OS) const { } } +template +iterator_range::UOCWDE *> ssahasra wrote: Just say ``auto`` as the return type here? Or if this needs to be exposed in an outer header file, then nam

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-29 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -40,6 +40,10 @@ template class GenericUniformityInfo { using CycleInfoT = GenericCycleInfo; using CycleT = typename CycleInfoT::CycleT; + // Use outside cycle with divergent exit + using UOCWDE = ssahasra wrote: This declaration got repeated. One of

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-29 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/124298 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-29 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -395,6 +399,14 @@ template class GenericUniformityAnalysisImpl { } void print(raw_ostream &out) const; + SmallVector UsesOutsideCycleWithDivergentExit; + void recordUseOutsideCycleWithDivergentExit(const InstructionT *, ssahasra wrote: You're right

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Temporal divergence lowering (non i1) (PR #124298)

2025-01-28 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -395,6 +399,14 @@ template class GenericUniformityAnalysisImpl { } void print(raw_ostream &out) const; + SmallVector UsesOutsideCycleWithDivergentExit; + void recordUseOutsideCycleWithDivergentExit(const InstructionT *, ssahasra wrote: Everywhere i

[llvm-branch-commits] [clang] [llvm] AMDGPU: Fix libcall recognition of image array types (PR #119832)

2024-12-15 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -622,9 +622,9 @@ bool ItaniumParamParser::parseItaniumParam(StringRef& param, if (isDigit(TC)) { res.ArgType = StringSwitch(eatLengthPrefixedName(param)) -.Case("ocl_image1darray", AMDGPULibFunc::IMG1DA) -.Case("ocl_image1dbuffer", AMDGP

[llvm-branch-commits] [clang] [llvm] AMDGPU: Fix libcall recognition of image array types (PR #119832)

2024-12-15 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra approved this pull request. https://github.com/llvm/llvm-project/pull/119832 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-22 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra reopened https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-22 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra closed https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: > Note that I have not yet finished verifying all the lit tests. I might also > have to add a few more tests, especially involving a mix of irreducible and > reducible cycles that are siblings and/or nested inside each other in various > combinations. Especially with some overl

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: > This needs a finer method that redirects only specific edges. Either that, or > we let the pass destroy some cycles. But updating `CycleInfo` for these > missing subcycles may be a fair amount of work too, so I would rather do it > the right way. This now depends on the newl

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -107,6 +107,12 @@ template class GenericCycle { return is_contained(Entries, Block); } + /// \brief Replace all entries with \p Block as single entry. + void setSingleEntry(BlockT *Block) { +Entries.clear(); +Entries.push_back(Block); ssaha

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
@@ -189,6 +195,21 @@ template class GenericCycle { //@{ using const_entry_iterator = typename SmallVectorImpl::const_iterator; + const_entry_iterator entry_begin() const { +return const_entry_iterator{Entries.begin()}; ssahasra wrote: Fixed. h

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #101386)

2024-08-21 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/101386 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #103014)

2024-08-13 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra closed https://github.com/llvm/llvm-project/pull/103014 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [FixIrreducible] Use CycleInfo instead of a custom SCC traversal (PR #103014)

2024-08-13 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/103014 1. CycleInfo efficiently locates all cycles in a single pass, while the SCC is repeated inside every natural loop. 2. CycleInfo provides a hierarchy of irreducible cycles, and the new implementation transform

[llvm-branch-commits] [llvm] [Transforms] Refactor CreateControlFlowHub (PR #103013)

2024-08-12 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra edited https://github.com/llvm/llvm-project/pull/103013 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [Transforms] Refactor CreateControlFlowHub (PR #103013)

2024-08-12 Thread Sameer Sahasrabuddhe via llvm-branch-commits
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/103013 CreateControlFlowHub is a method that redirects control flow edges from a set of incoming blocks to a set of outgoing blocks through a new set of "guard" blocks. This is now refactored into a separate file wit

[llvm-branch-commits] [llvm] [Attributor][AMDGPU] Improve the handling of indirect calls (PR #100954)

2024-07-28 Thread Sameer Sahasrabuddhe via llvm-branch-commits
ssahasra wrote: The apparent change here is to simply reverse the effect of #100952 on the lit test. Would be good to have a test which shows what the improvement is. Also, I think #100952 merely enables AAIndirectCallInfo, and feels like an integral part of this change itself. I would lean to

[llvm-branch-commits] [llvm] c540ce9 - [AMDGPU] pin lit test divergent-unswitch.ll to the old pass manager

2021-01-20 Thread Sameer Sahasrabuddhe via llvm-branch-commits
Author: Sameer Sahasrabuddhe Date: 2021-01-20T22:02:09+05:30 New Revision: c540ce9900ff99566b4951186e2f070b3b36cdbe URL: https://github.com/llvm/llvm-project/commit/c540ce9900ff99566b4951186e2f070b3b36cdbe DIFF: https://github.com/llvm/llvm-project/commit/c540ce9900ff99566b4951186e2f070b3b36cdb