[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Implement SIMemoryLegalizer (PR #154726)

2025-08-21 Thread Christudasan Devadasan via llvm-branch-commits


@@ -1656,6 +1656,11 @@ let OtherPredicates = [HasImageInsts] in {
   def S_WAIT_KMCNT_soft : SOPP_Pseudo <"s_soft_wait_kmcnt", (ins 
s16imm:$simm16), "$simm16">;
 }
 
+
+let SubtargetPredicate = HasWaitXcnt in {

cdevadas wrote:

This isn't a subtarget predicate. Use `OtherPredicates` instead.

https://github.com/llvm/llvm-project/pull/154726
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Arseniy Zaostrovnykh via llvm-branch-commits

https://github.com/necto approved this pull request.


https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Arseniy Zaostrovnykh via llvm-branch-commits


@@ -1246,6 +1323,9 @@ Moved checkers
   checker ``alpha.security.ArrayBound`` (which was searching for the same kind
   of bugs with an different, simpler and less accurate algorithm) is removed.

necto wrote:

```suggestion
  of bugs with a different, simpler and less accurate algorithm) is removed.
```

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

steakhal wrote:

> Looks good to me. Are all of those crashes present in previously released 
> stable versions?

To the best of knowledge yes. I also checked that no entries refer to commits 
that are only present on `main`.
(Well, there were two, that are now in the backport pipe)
I also excluded new feature crashes, such as the `assume` handling crashes, 
because that's new feature; thus it was never released.

You can spotcheck this though.

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] move omp end sections validation to semantics (PR #154740)

2025-08-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah created 
https://github.com/llvm/llvm-project/pull/154740

See #90452. The old parse tree errors exploded to thousands of unhelpful lines 
when there were multiple missing end directives.

Instead, allow a missing end directive in the parse tree then validate that it 
is present during semantics (where the error messages are a lot easier to 
control).

>From 0f03fe4c01293b50ffed454a99bc18c80132c168 Mon Sep 17 00:00:00 2001
From: Tom Eccles 
Date: Wed, 20 Aug 2025 17:19:24 +
Subject: [PATCH] [flang][OpenMP] move omp end sections validation to semantics

See #90452. The old parse tree errors exploded to thousands of unhelpful
lines when there were multiple missing end directives.

Instead, allow a missing end directive in the parse tree then validate
that it is present during semantics (where the error messages are a lot
easier to control).
---
 flang/include/flang/Parser/parse-tree.h |  5 -
 flang/lib/Lower/OpenMP/OpenMP.cpp   |  7 +--
 flang/lib/Parser/openmp-parsers.cpp |  2 +-
 flang/lib/Parser/unparse.cpp|  2 +-
 flang/lib/Semantics/check-omp-structure.cpp | 17 +
 .../Semantics/OpenMP/missing-end-directive.f90  |  4 
 6 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 38ec605574c06..9962a25c29600 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -4903,8 +4903,11 @@ struct OpenMPSectionsConstruct {
   CharBlock source;
   // Each of the OpenMPConstructs in the list below contains an
   // OpenMPSectionConstruct. This is guaranteed by the parser.
+  // The end sections directive is optional here because it is difficult to
+  // generate helpful error messages for a missing end directive wihtin the
+  // parser. Semantics will generate an error if this is absent.
   std::tuple,
-  OmpEndSectionsDirective>
+  std::optional>
   t;
 };
 
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index ec2ec37e623f8..da5898480da22 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -3958,9 +3958,12 @@ static void genOMP(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   List clauses = makeClauses(
   std::get(beginSectionsDirective.t), semaCtx);
   const auto &endSectionsDirective =
-  std::get(sectionsConstruct.t);
+  std::get>(
+  sectionsConstruct.t);
+  assert(endSectionsDirective &&
+ "Missing end section directive should have been handled in 
semantics");
   clauses.append(makeClauses(
-  std::get(endSectionsDirective.t), semaCtx));
+  std::get(endSectionsDirective->t), semaCtx));
   mlir::Location currentLocation = converter.getCurrentLocation();
 
   llvm::omp::Directive directive =
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 9670302c8549b..d7db3d55c3e17 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1919,7 +1919,7 @@ TYPE_PARSER(sourced(construct(
 construct(maybe(sectionDir), block))),
 many(construct(
 sourced(construct(sectionDir, block),
-Parser{} / endOmpLine)))
+maybe(Parser{} / endOmpLine
 
 static bool IsExecutionPart(const OmpDirectiveName &name) {
   return name.IsExecutionPart();
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 09dcfe60a46bc..87e699dbc4e8d 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2788,7 +2788,7 @@ class UnparseVisitor {
 Walk(std::get>(x.t), "");
 BeginOpenMP();
 Word("!$OMP END ");
-Walk(std::get(x.t));
+Walk(std::get>(x.t));
 Put("\n");
 EndOpenMP();
   }
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 0bdd2c62f88ce..4a52e06cccb1f 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -1047,14 +1047,23 @@ void OmpStructureChecker::Leave(const 
parser::OmpBeginDirective &) {
 void OmpStructureChecker::Enter(const parser::OpenMPSectionsConstruct &x) {
   const auto &beginSectionsDir{
   std::get(x.t)};
-  const auto &endSectionsDir{std::get(x.t)};
+  const auto &endSectionsDir{
+  std::get>(x.t)};
   const auto &beginDir{
   std::get(beginSectionsDir.t)};
-  const auto &endDir{std::get(endSectionsDir.t)};
+  PushContextAndClauseSets(beginDir.source, beginDir.v);
+
+  if (!endSectionsDir) {
+context_.Say(beginSectionsDir.source,
+"Expected OpenMP END SECTIONS directive"_err_en_US);
+// Following code assumes the option is present.
+return;
+  }
+
+  const auto 
&endDir{std::get(endSectionsDir->t)};
   CheckMatching(beginDir, endDir);
 
-  PushContextAndClauseSets(beginDir.source, beginDir.v);
-  AddEndD

[llvm-branch-commits] [flang] [flang][OpenMP] move omp end sections validation to semantics (PR #154740)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-semantics

Author: Tom Eccles (tblah)


Changes

See #90452. The old parse tree errors exploded to thousands of 
unhelpful lines when there were multiple missing end directives.

Instead, allow a missing end directive in the parse tree then validate that it 
is present during semantics (where the error messages are a lot easier to 
control).

---
Full diff: https://github.com/llvm/llvm-project/pull/154740.diff


6 Files Affected:

- (modified) flang/include/flang/Parser/parse-tree.h (+4-1) 
- (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+5-2) 
- (modified) flang/lib/Parser/openmp-parsers.cpp (+1-1) 
- (modified) flang/lib/Parser/unparse.cpp (+1-1) 
- (modified) flang/lib/Semantics/check-omp-structure.cpp (+13-4) 
- (modified) flang/test/Semantics/OpenMP/missing-end-directive.f90 (+4) 


``diff
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 38ec605574c06..9962a25c29600 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -4903,8 +4903,11 @@ struct OpenMPSectionsConstruct {
   CharBlock source;
   // Each of the OpenMPConstructs in the list below contains an
   // OpenMPSectionConstruct. This is guaranteed by the parser.
+  // The end sections directive is optional here because it is difficult to
+  // generate helpful error messages for a missing end directive wihtin the
+  // parser. Semantics will generate an error if this is absent.
   std::tuple,
-  OmpEndSectionsDirective>
+  std::optional>
   t;
 };
 
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index ec2ec37e623f8..da5898480da22 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -3958,9 +3958,12 @@ static void genOMP(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   List clauses = makeClauses(
   std::get(beginSectionsDirective.t), semaCtx);
   const auto &endSectionsDirective =
-  std::get(sectionsConstruct.t);
+  std::get>(
+  sectionsConstruct.t);
+  assert(endSectionsDirective &&
+ "Missing end section directive should have been handled in 
semantics");
   clauses.append(makeClauses(
-  std::get(endSectionsDirective.t), semaCtx));
+  std::get(endSectionsDirective->t), semaCtx));
   mlir::Location currentLocation = converter.getCurrentLocation();
 
   llvm::omp::Directive directive =
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 9670302c8549b..d7db3d55c3e17 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1919,7 +1919,7 @@ TYPE_PARSER(sourced(construct(
 construct(maybe(sectionDir), block))),
 many(construct(
 sourced(construct(sectionDir, block),
-Parser{} / endOmpLine)))
+maybe(Parser{} / endOmpLine
 
 static bool IsExecutionPart(const OmpDirectiveName &name) {
   return name.IsExecutionPart();
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 09dcfe60a46bc..87e699dbc4e8d 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2788,7 +2788,7 @@ class UnparseVisitor {
 Walk(std::get>(x.t), "");
 BeginOpenMP();
 Word("!$OMP END ");
-Walk(std::get(x.t));
+Walk(std::get>(x.t));
 Put("\n");
 EndOpenMP();
   }
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 0bdd2c62f88ce..4a52e06cccb1f 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -1047,14 +1047,23 @@ void OmpStructureChecker::Leave(const 
parser::OmpBeginDirective &) {
 void OmpStructureChecker::Enter(const parser::OpenMPSectionsConstruct &x) {
   const auto &beginSectionsDir{
   std::get(x.t)};
-  const auto &endSectionsDir{std::get(x.t)};
+  const auto &endSectionsDir{
+  std::get>(x.t)};
   const auto &beginDir{
   std::get(beginSectionsDir.t)};
-  const auto &endDir{std::get(endSectionsDir.t)};
+  PushContextAndClauseSets(beginDir.source, beginDir.v);
+
+  if (!endSectionsDir) {
+context_.Say(beginSectionsDir.source,
+"Expected OpenMP END SECTIONS directive"_err_en_US);
+// Following code assumes the option is present.
+return;
+  }
+
+  const auto 
&endDir{std::get(endSectionsDir->t)};
   CheckMatching(beginDir, endDir);
 
-  PushContextAndClauseSets(beginDir.source, beginDir.v);
-  AddEndDirectiveClauses(std::get(endSectionsDir.t));
+  AddEndDirectiveClauses(std::get(endSectionsDir->t));
 
   const auto §ionBlocks{std::get>(x.t)};
   for (const parser::OpenMPConstruct &construct : sectionBlocks) {
diff --git a/flang/test/Semantics/OpenMP/missing-end-directive.f90 
b/flang/test/Semantics/OpenMP/missing-end-directive.f90
index 3b870d134155b..33481f9d650f4 100644
--- a/flang/test/Semantics/OpenMP/missing-end-directive.f90
+++ 

[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/154600

>From 282a84dbcc57738398da024f021bcc057099edb3 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Wed, 20 Aug 2025 21:40:26 +0200
Subject: [PATCH 1/2] [analyzer][docs] CSA release notes for clang-21

The commits were gathered using:
```sh
git log --reverse --oneline llvmorg-20-init..llvm/main \
  clang/{lib/StaticAnalyzer,include/clang/StaticAnalyzer} | grep -v NFC | \
  grep -v OpenACC | grep -v -i revert | grep -v -i "webkit"
```

FYI, I also ignored Webkit changes because I assue it's fairly specific
for them, and they likely already know what they ship xD.

I used the `LLVM_ENABLE_SPHINX=ON` and `LLVM_ENABLE_DOXYGEN=ON` cmake
options to enable the `docs-clang-html` build target, which generates
the html into `build/tools/clang/docs/html/ReleaseNotes.html` of which I
attach the screenshots to let you judge if it looks all good or not.
---
 clang/docs/ReleaseNotes.rst | 90 ++---
 1 file changed, 85 insertions(+), 5 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index f4f7dd8342d92..a8fd4b174cf7c 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1198,8 +1198,6 @@ Code Completion
 
 Static Analyzer
 ---
-- Fixed a crash when C++20 parenthesized initializer lists are used. This issue
-  was causing a crash in clang-tidy. (#GH136041)
 
 New features
 
@@ -1223,20 +1221,99 @@ New features
 - Implemented `P2719R5 Type-aware allocation and deallocation functions 
`_
   as an extension in all C++ language modes.
 
+- Added support for the ``[[clang::assume(cond)]]`` attribute, treating it as
+  ``__builtin_assume(cond)`` for better static analysis. (#GH129234)
+
+- Introduced per-entry-point statistics to provide more detailed analysis 
metrics.
+  Documentation: :doc:`analyzer/developer-docs/Statistics` (#GH131175)
+
+- Added time-trace scopes for high-level analyzer steps to improve performance
+  debugging. Documentation: 
:doc:`analyzer/developer-docs/PerformanceInvestigation`
+  (#GH125508, #GH125884)
+
+- Enhanced the ``check::BlockEntrance`` checker callback to provide more 
granular
+  control over block-level analysis.
+  `Documentation (check::BlockEntrance)
+  `_
+  (#GH140924)
+
+- Added a new experimental checker ``alpha.core.FixedAddressDereference`` to 
detect
+  dereferences of fixed addresses, which can be useful for finding hard-coded 
memory
+  accesses. (#GH127191)
 
 Crash and bug fixes
 ^^^
 
+- Fixed a crash when C++20 parenthesized initializer lists are used.
+  This affected a crash of the well-known lambda overloaded pattern.
+  (#GH136041, #GH135665)
+
+- Dropped an unjustified assertion, that was triggered in 
``BugReporterVisitors.cpp``
+  for variable initialization detection. (#GH125044)
+
 - Fixed a crash in ``UnixAPIMisuseChecker`` and ``MallocChecker`` when 
analyzing
   code with non-standard ``getline`` or ``getdelim`` function signatures. 
(#GH144884)
 
+- Fixed crashes involving ``__builtin_bit_cast``. (#GH139188)
+
+- ``__datasizeof`` (C++) and ``_Countof`` (C) no longer cause a failed 
assertion
+  when given an operand of VLA type. (#GH151711)
+
+- Fixed a crash in ``CastSizeChecker``. (#GH134387)
+
+- Some ``cplusplus.PlacementNew`` false positives were fixed. (#GH150161)
+
 Improvements
 
 
+- Added option to assume at least one iteration in loops to reduce false 
positives.
+  (#GH125494)
+
 - The checker option ``optin.cplusplus.VirtualCall:PureOnly`` was removed,
-  because it had been deprecated since 2019 and it is completely useless (it
-  was kept only for compatibility with pre-2019 versions, setting it to true is
-  equivalent to completely disabling the checker).
+  because it had been deprecated since 2019. (#GH131823)
+
+- Enhanced the ``StackAddrEscapeChecker`` to detect more cases of stack address
+  escapes, including return values for child stack frames. (#GH126620, 
#GH126986)
+
+- Improved the ``BlockInCriticalSectionChecker`` to recognize ``O_NONBLOCK``
+  streams and suppress reports in those cases. (#GH127049)
+
+- Better support for lambda-converted function pointers in analysis. 
(#GH144906)
+
+- Improved modeling of ``getcwd`` function in ``StdCLibraryFunctions`` checker.
+  (#GH141076)
+
+- Enhanced the ``EnumCastOutOfRange`` checker to ignore 
``[[clang::flag_enum]]``
+  enums. (#GH141232)
+
+- Improved handling of structured bindings captured by lambdas. (#GH132579, 
#GH91835)
+
+- Fixed unnamed bitfield handling in ``UninitializedObjectChecker``. 
(#GH132427, #GH132001)
+
+- Enhanced iterator checker modeling for ``insert`` operations. (#GH132596)
+
+- Improved ``format`` attribute handling in ``GenericTaintChecker``. 
(#GH132765)
+
+- Added support for ``consteval`` in ``ConditionBRVisitor::VisitTerminator``.
+  

[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Balazs Benics via llvm-branch-commits


@@ -1246,6 +1323,9 @@ Moved checkers
   checker ``alpha.security.ArrayBound`` (which was searching for the same kind
   of bugs with an different, simpler and less accurate algorithm) is removed.
 
+- Moved checker ``alpha.core.FixedAddressDereference`` out of the ``alpha`` 
package
+  to ``core.FixedAddressDereference ``. (#GH132404)

steakhal wrote:

```suggestion
  to ``core.FixedAddressDereference``. (#GH132404)
```

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] move omp end sections validation to semantics (PR #154740)

2025-08-21 Thread Tom Eccles via llvm-branch-commits

tblah wrote:

Part 1: https://github.com/llvm/llvm-project/pull/154739

https://github.com/llvm/llvm-project/pull/154740
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Analyzer] No longer crash with VLA operands to unary type traits (PR #154738)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal milestoned 
https://github.com/llvm/llvm-project/pull/154738
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA section (PR #154608)

2025-08-21 Thread Erich Keane via llvm-branch-commits

erichkeane wrote:

> > > Dammit, yeah I wish I could still see the conflict to work out how I 
> > > misread it - I suspect the addition of "A new flag - 
> > > `-static-libclosure`..." to new flags didn't conflict :-/
> > 
> > 
> > You can actually, `git show --remerge-diff 
> > 30401b1f918ea359334b507a79118938ffe3c169` 
> > ([docs](https://git-scm.com/docs/git-show#Documentation/git-show.txt---remerge-diff)).
> >  :)
> 
> I think this was a pebkac issue with me using the GH editor to resolve the 
> diff - if I merge locally the change seems obvious, and I can't think of a 
> reason to make this error.

Thank you so much for fixing this :)  I appreciate your help!

https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [mlir] [OMPIRBuilder] Add support for explicit deallocation points (PR #154752)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-codegen

Author: Sergio Afonso (skatrak)


Changes

In this patch, some OMPIRBuilder codegen functions and callbacks are updated to 
work with arrays of deallocation insertion points. The purpose of this is to 
enable the replacement of `alloca`s with other types of allocations that 
require explicit deallocations in a way that makes it possible for 
`CodeExtractor` instances created during OMPIRBuilder finalization to also use 
them.

The OpenMP to LLVM IR MLIR translation pass is updated to properly store and 
forward deallocation points together with their matching allocation point to 
the OMPIRBuilder.

Currently, only the `DeviceSharedMemCodeExtractor` uses this feature to get the 
`CodeExtractor` to use device shared memory for intermediate allocations when 
outlining a parallel region inside of a Generic kernel (code path that is only 
used by Flang via MLIR, currently). However, long term this might also be 
useful to refactor finalization of variables with destructors, potentially 
reducing the use of callbacks and simplifying privatization and reductions.

Instead of a single deallocation point, lists of those are used. This is to 
cover cases where there are multiple exit blocks originating from a single 
entry. If an allocation needing explicit deallocation is placed in the entry 
block of such cases, it would need to be deallocated before each of the exits.

---

Patch is 143.00 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154752.diff


15 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+2-2) 
- (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+39-30) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+59-40) 
- (modified) llvm/include/llvm/Transforms/Utils/CodeExtractor.h (+12-12) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+136-162) 
- (modified) llvm/lib/Transforms/IPO/HotColdSplitting.cpp (+1-1) 
- (modified) llvm/lib/Transforms/IPO/IROutliner.cpp (+2-2) 
- (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+7-4) 
- (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+20-17) 
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+193-132) 
- (modified) llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp (+1-1) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+137-95) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+9-9) 
- (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+3-1) 
- (modified) mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir 
(+2) 


``diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index f98339d472fa9..f0cb7531845c8 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10500,8 +10500,8 @@ void CGOpenMPRuntime::emitTargetDataCalls(
   llvm::OpenMPIRBuilder::LocationDescription OmpLoc(CodeGenIP);
   llvm::OpenMPIRBuilder::InsertPointTy AfterIP =
   cantFail(OMPBuilder.createTargetData(
-  OmpLoc, AllocaIP, CodeGenIP, DeviceID, IfCondVal, Info, GenMapInfoCB,
-  CustomMapperCB,
+  OmpLoc, AllocaIP, CodeGenIP, /*DeallocIPs=*/{}, DeviceID, IfCondVal,
+  Info, GenMapInfoCB, CustomMapperCB,
   /*MapperFunc=*/nullptr, BodyCB, DeviceAddrCB, RTLoc));
   CGF.Builder.restoreIP(AfterIP);
 }
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp 
b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index f6a0ca574a191..959c66593b157 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1835,10 +1835,10 @@ void CodeGenFunction::EmitOMPParallelDirective(const 
OMPParallelDirective &S) {
 const CapturedStmt *CS = S.getCapturedStmt(OMPD_parallel);
 const Stmt *ParallelRegionBodyStmt = CS->getCapturedStmt();
 
-auto BodyGenCB = [&, this](InsertPointTy AllocaIP,
-   InsertPointTy CodeGenIP) {
+auto BodyGenCB = [&, this](InsertPointTy AllocIP, InsertPointTy CodeGenIP,
+   ArrayRef DeallocIPs) {
   OMPBuilderCBHelpers::EmitOMPOutlinedRegionBody(
-  *this, ParallelRegionBodyStmt, AllocaIP, CodeGenIP, "parallel");
+  *this, ParallelRegionBodyStmt, AllocIP, CodeGenIP, "parallel");
   return llvm::Error::success();
 };
 
@@ -1846,9 +1846,10 @@ void CodeGenFunction::EmitOMPParallelDirective(const 
OMPParallelDirective &S) {
 CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI);
 llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
 AllocaInsertPt->getParent(), AllocaInsertPt->getIterator());
-llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(
-OMPBuilder.createParallel(Builder, AllocaIP, BodyGenCB, PrivCB, FiniCB,
-  IfCond, NumThreads, ProcBind, 
S.hasCancel()));
+llvm::OpenMPIRBuilder::InsertPointTy AfterIP =
+cantFail(OMPBuil

[llvm-branch-commits] [clang] [llvm] [mlir] [OMPIRBuilder] Add support for explicit deallocation points (PR #154752)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Sergio Afonso (skatrak)


Changes

In this patch, some OMPIRBuilder codegen functions and callbacks are updated to 
work with arrays of deallocation insertion points. The purpose of this is to 
enable the replacement of `alloca`s with other types of allocations that 
require explicit deallocations in a way that makes it possible for 
`CodeExtractor` instances created during OMPIRBuilder finalization to also use 
them.

The OpenMP to LLVM IR MLIR translation pass is updated to properly store and 
forward deallocation points together with their matching allocation point to 
the OMPIRBuilder.

Currently, only the `DeviceSharedMemCodeExtractor` uses this feature to get the 
`CodeExtractor` to use device shared memory for intermediate allocations when 
outlining a parallel region inside of a Generic kernel (code path that is only 
used by Flang via MLIR, currently). However, long term this might also be 
useful to refactor finalization of variables with destructors, potentially 
reducing the use of callbacks and simplifying privatization and reductions.

Instead of a single deallocation point, lists of those are used. This is to 
cover cases where there are multiple exit blocks originating from a single 
entry. If an allocation needing explicit deallocation is placed in the entry 
block of such cases, it would need to be deallocated before each of the exits.

---

Patch is 143.00 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154752.diff


15 Files Affected:

- (modified) clang/lib/CodeGen/CGOpenMPRuntime.cpp (+2-2) 
- (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+39-30) 
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+59-40) 
- (modified) llvm/include/llvm/Transforms/Utils/CodeExtractor.h (+12-12) 
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+136-162) 
- (modified) llvm/lib/Transforms/IPO/HotColdSplitting.cpp (+1-1) 
- (modified) llvm/lib/Transforms/IPO/IROutliner.cpp (+2-2) 
- (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+7-4) 
- (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+20-17) 
- (modified) llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp (+193-132) 
- (modified) llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp (+1-1) 
- (modified) 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+137-95) 
- (modified) mlir/test/Target/LLVMIR/omptarget-parallel-llvm.mlir (+9-9) 
- (modified) mlir/test/Target/LLVMIR/omptarget-region-device-llvm.mlir (+3-1) 
- (modified) mlir/test/Target/LLVMIR/openmp-target-private-allocatable.mlir 
(+2) 


``diff
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index f98339d472fa9..f0cb7531845c8 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -10500,8 +10500,8 @@ void CGOpenMPRuntime::emitTargetDataCalls(
   llvm::OpenMPIRBuilder::LocationDescription OmpLoc(CodeGenIP);
   llvm::OpenMPIRBuilder::InsertPointTy AfterIP =
   cantFail(OMPBuilder.createTargetData(
-  OmpLoc, AllocaIP, CodeGenIP, DeviceID, IfCondVal, Info, GenMapInfoCB,
-  CustomMapperCB,
+  OmpLoc, AllocaIP, CodeGenIP, /*DeallocIPs=*/{}, DeviceID, IfCondVal,
+  Info, GenMapInfoCB, CustomMapperCB,
   /*MapperFunc=*/nullptr, BodyCB, DeviceAddrCB, RTLoc));
   CGF.Builder.restoreIP(AfterIP);
 }
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp 
b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index f6a0ca574a191..959c66593b157 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1835,10 +1835,10 @@ void CodeGenFunction::EmitOMPParallelDirective(const 
OMPParallelDirective &S) {
 const CapturedStmt *CS = S.getCapturedStmt(OMPD_parallel);
 const Stmt *ParallelRegionBodyStmt = CS->getCapturedStmt();
 
-auto BodyGenCB = [&, this](InsertPointTy AllocaIP,
-   InsertPointTy CodeGenIP) {
+auto BodyGenCB = [&, this](InsertPointTy AllocIP, InsertPointTy CodeGenIP,
+   ArrayRef DeallocIPs) {
   OMPBuilderCBHelpers::EmitOMPOutlinedRegionBody(
-  *this, ParallelRegionBodyStmt, AllocaIP, CodeGenIP, "parallel");
+  *this, ParallelRegionBodyStmt, AllocIP, CodeGenIP, "parallel");
   return llvm::Error::success();
 };
 
@@ -1846,9 +1846,10 @@ void CodeGenFunction::EmitOMPParallelDirective(const 
OMPParallelDirective &S) {
 CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(*this, &CGSI);
 llvm::OpenMPIRBuilder::InsertPointTy AllocaIP(
 AllocaInsertPt->getParent(), AllocaInsertPt->getIterator());
-llvm::OpenMPIRBuilder::InsertPointTy AfterIP = cantFail(
-OMPBuilder.createParallel(Builder, AllocaIP, BodyGenCB, PrivCB, FiniCB,
-  IfCond, NumThreads, ProcBind, 
S.hasCancel()));
+llvm::OpenMPIRBuilder::InsertPointTy AfterIP =
+cantFail(OMPBuilder.creat

[llvm-branch-commits] [clang] [HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (PR #154454)

2025-08-21 Thread Chris B via llvm-branch-commits


@@ -76,9 +76,10 @@ class Action {
 StaticLibJobClass,
 BinaryAnalyzeJobClass,
 BinaryTranslatorJobClass,
+BinaryModifyJobClass,

llvm-beanz wrote:

Probably better to align this with the specific tool you're running:
```suggestion
ObjcopyJobClass,
```

That is consistent with most of the other toolchain jobs (e..g. LipoJobClass, 
DsymutilJobClass, LinkerWrapperJobClass, etc).

https://github.com/llvm/llvm-project/pull/154454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Split large loop dependence masks (PR #153187)

2025-08-21 Thread Sander de Smalen via llvm-branch-commits


@@ -5248,49 +5248,94 @@ 
AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
  SelectionDAG &DAG) const {
   SDLoc DL(Op);
   uint64_t EltSize = Op.getConstantOperandVal(2);
-  EVT VT = Op.getValueType();
+  EVT FullVT = Op.getValueType();
+  unsigned NumElements = FullVT.getVectorMinNumElements();
+  unsigned NumSplits = 0;
+  EVT EltVT;
   switch (EltSize) {
   case 1:
-if (VT != MVT::v16i8 && VT != MVT::nxv16i1)
-  return SDValue();
+EltVT = MVT::i8;
 break;
   case 2:
-if (VT != MVT::v8i8 && VT != MVT::nxv8i1)
-  return SDValue();
+if (NumElements >= 16)

sdesmalen-arm wrote:

When the number of elements is smaller, not returning `SDValue()` results in 
selectiondag failures, e.g.

```
define  @whilewr_64_split(ptr %a, ptr %b) {
entry:
  %0 = call  @llvm.loop.dependence.war.mask.nxv4i1(ptr %a, ptr 
%b, i64 1)
  ret  %0
}
```

There seems to be missing test-coverage for that.

https://github.com/llvm/llvm-project/pull/153187
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Arseniy Zaostrovnykh via llvm-branch-commits


@@ -1223,20 +1221,99 @@ New features
 - Implemented `P2719R5 Type-aware allocation and deallocation functions 
`_
   as an extension in all C++ language modes.
 
+- Added support for the ``[[clang::assume(cond)]]`` attribute, treating it as
+  ``__builtin_assume(cond)`` for better static analysis. (#GH129234)
+
+- Introduced per-entry-point statistics to provide more detailed analysis 
metrics.
+  Documentation: :doc:`analyzer/developer-docs/Statistics` (#GH131175)
+
+- Added time-trace scopes for high-level analyzer steps to improve performance
+  debugging. Documentation: 
:doc:`analyzer/developer-docs/PerformanceInvestigation`
+  (#GH125508, #GH125884)
+
+- Enhanced the ``check::BlockEntrance`` checker callback to provide more 
granular
+  control over block-level analysis.
+  `Documentation (check::BlockEntrance)
+  `_
+  (#GH140924)
+
+- Added a new experimental checker ``alpha.core.FixedAddressDereference`` to 
detect
+  dereferences of fixed addresses, which can be useful for finding hard-coded 
memory
+  accesses. (#GH127191)

necto wrote:

Later in "Moved Checkers" it is mentioned to be moved to stable:
> - Moved checker ``alpha.core.FixedAddressDereference`` out of the ``alpha`` 
> package
  to ``core.FixedAddressDereference``. (#GH132404)
 
Should these two entries be combined?

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/154608

>From c522688652800329bf5beef9c378192826521f0d Mon Sep 17 00:00:00 2001
From: erichkeane 
Date: Wed, 20 Aug 2025 13:37:33 -0700
Subject: [PATCH 1/2] Move rest of documentation problems that found their way
 to the SA sec.

It was brought up in response to #154605 that these two were in the
wrong place as well!  This patch tries to find better places for them,
  and moves them.
---
 clang/docs/ReleaseNotes.rst | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index f4f7dd8342d92..0745c6117cbea 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -123,6 +123,8 @@ C++ Language Changes
   a perfect match (all conversion sequences are identity conversions) template 
candidates are not instantiated.
   Diagnostics that would have resulted from the instantiation of these 
template candidates are no longer
   produced. This aligns Clang closer to the behavior of GCC, and fixes 
(#GH62096), (#GH74581), and (#GH74581).
+- Implemented `P2719R5 Type-aware allocation and deallocation functions 
`_
+  as an extension in all C++ language modes.
 
 C++2c Feature Support
 ^
@@ -378,6 +380,11 @@ New Compiler Flags
 
 - New options ``-fthinlto-distributor=`` and ``-Xthinlto-distributor=`` added 
for Integrated Distributed ThinLTO (DTLTO). DTLTO enables the distribution of 
backend ThinLTO compilations via external distribution systems, such as 
Incredibuild, during the traditional link step. (#GH147265, `ThinLTODocs 
`_).
 
+- A new flag - `-static-libclosure` was introduced to support statically 
linking
+  the runtime for the Blocks extension on Windows. This flag currently only
+  changes the code generation, and even then, only on Windows. This does not
+  impact the linker behaviour like the other `-static-*` flags.
+
 Deprecated Compiler Flags
 -
 
@@ -1204,26 +1211,6 @@ Static Analyzer
 New features
 
 
-- A new flag - `-static-libclosure` was introduced to support statically 
linking
-  the runtime for the Blocks extension on Windows. This flag currently only
-  changes the code generation, and even then, only on Windows. This does not
-  impact the linker behaviour like the other `-static-*` flags.
-- OpenACC support, enabled via `-fopenacc` has reached a level of completeness
-  to finally be at least notionally usable. Currently, the OpenACC 3.4
-  specification has been completely implemented for Sema and AST creation, so
-  nodes will show up in the AST after having been properly checked. Lowering is
-  currently a work in progress, with compute, loop, and combined constructs
-  partially implemented, plus a handful of data and executable constructs
-  implemented. Lowering will only work in Clang-IR mode (so only with a 
compiler
-  built with Clang-IR enabled, and with `-fclangir` used on the command line).
-  However, note that the Clang-IR implementation status is also quite partial,
-  so frequent 'not yet implemented' diagnostics should be expected.  Also, the
-  ACC MLIR dialect does not currently implement any lowering to LLVM-IR, so no
-  code generation is possible for OpenACC.
-- Implemented `P2719R5 Type-aware allocation and deallocation functions 
`_
-  as an extension in all C++ language modes.
-
-
 Crash and bug fixes
 ^^^
 

>From ce097553f86a252c0d463e9fa924214fc8b9f091 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 21 Aug 2025 13:39:28 +0200
Subject: [PATCH 2/2] NFC Drop the now duplicate paragraph; keep the original
 under "New Compiler Flags"

---
 clang/docs/ReleaseNotes.rst | 5 -
 1 file changed, 5 deletions(-)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 8fe6a8322999a..e33fb4dae1b25 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1240,11 +1240,6 @@ Static Analyzer
 New features
 
 
-- A new flag - `-static-libclosure` was introduced to support statically 
linking
-  the runtime for the Blocks extension on Windows. This flag currently only
-  changes the code generation, and even then, only on Windows. This does not
-  impact the linker behaviour like the other `-static-*` flags.
-
 Crash and bug fixes
 ^^^
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-08-21 Thread via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,

agozillon wrote:

Would be good to have as a general follow up PR in the near future, i think the 
namespace usage varies quite a bit in this file

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [mlir] [MLIR][OpenMP] Introduce overlapped record type map support (PR #119588)

2025-08-21 Thread via llvm-branch-commits


@@ -2979,39 +2979,61 @@ static int getMapDataMemberIdx(MapInfoData &mapData, 
omp::MapInfoOp memberOp) {
   return std::distance(mapData.MapClause.begin(), res);
 }
 
-static omp::MapInfoOp getFirstOrLastMappedMemberPtr(omp::MapInfoOp mapInfo,
-bool first) {
-  ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
-  // Only 1 member has been mapped, we can return it.
-  if (indexAttr.size() == 1)
-return cast(mapInfo.getMembers()[0].getDefiningOp());
+static void sortMapIndices(llvm::SmallVector &indices,

agozillon wrote:

Or, at least it did last I checked! But haven't paid too much attention 
recently :D There's a chance @skatrak cleaned it up at some point in one of his 
refactors.

https://github.com/llvm/llvm-project/pull/119588
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Allow folding multiple uses of some immediates into copies (PR #154757)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/154757
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Allow folding multiple uses of some immediates into copies (PR #154757)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/154757?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#154757** https://app.graphite.dev/github/pr/llvm/llvm-project/154757?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/154757?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#154682** https://app.graphite.dev/github/pr/llvm/llvm-project/154682?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/154757
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Allow folding multiple uses of some immediates into copies (PR #154757)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.

---

Patch is 453.67 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154757.diff


46 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+24-3) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.interp.inreg.ll 
(+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mubuf-global.ll (+9-11) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll (+15-11) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+26-26) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll (+10-7) 
- (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-gas.ll (+3-2) 
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+80-80) 
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+11-11) 
- (modified) llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll 
(+7-7) 
- (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll 
(+178-178) 
- (modified) 
llvm/test/CodeGen/AMDGPU/dagcomb-extract-vec-elt-different-sizes.ll (+18-18) 
- (modified) llvm/test/CodeGen/AMDGPU/dagcombine-fmul-sel.ll (+76-36) 
- (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+56-56) 
- (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+555-555) 
- (modified) llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll 
(+15-13) 
- (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-f16.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/extract_vector_elt-i16.ll (+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/fmul-to-ldexp.ll (+29-20) 
- (modified) llvm/test/CodeGen/AMDGPU/fptoi.i128.ll (+196-194) 
- (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (+32-48) 
- (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll 
(+5-6) 
- (modified) llvm/test/CodeGen/AMDGPU/iglp-no-clobber.ll (+1-1) 
- (modified) 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.AFLCustomIRMutator.opt.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.frexp.ll (+62-33) 
- (modified) llvm/test/CodeGen/AMDGPU/mad-combine.ll (+9-9) 
- (modified) llvm/test/CodeGen/AMDGPU/masked-load-vectortypes.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll (+1-1) 
- (added) llvm/test/CodeGen/AMDGPU/peephole-fold-imm-multi-use.mir (+94) 
- (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+116-116) 
- (modified) llvm/test/CodeGen/AMDGPU/roundeven.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+166-186) 
- (modified) llvm/test/CodeGen/AMDGPU/sdiv64.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/shift-and-i64-ubfe.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/sint_to_fp.f64.ll (+21-21) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-agpr.ll (+116-116) 
- (modified) llvm/test/CodeGen/AMDGPU/srem64.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/srl.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/subreg-coalescer-crash.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/udiv64.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/uint_to_fp.f64.ll (+21-21) 
- (modified) llvm/test/CodeGen/AMDGPU/undef-handling-crash-in-ra.ll (+19-21) 
- (modified) llvm/test/CodeGen/AMDGPU/urem64.ll (+47-49) 
- (modified) llvm/test/CodeGen/AMDGPU/v_cndmask.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/valu-i1.ll (+1-1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 75b303086163b..1be8d99834f93 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -3559,13 +3559,12 @@ static unsigned getNewFMAMKInst(const GCNSubtarget &ST, 
unsigned Opc) {
 
 bool SIInstrInfo::foldImmediate(MachineInstr &UseMI, MachineInstr &DefMI,
 Register Reg, MachineRegisterInfo *MRI) const {
-  if (!MRI->hasOneNonDBGUse(Reg))
-return false;
-
   int64_t Imm;
   if (!getConstValDefinedInReg(DefMI, Reg, Imm))
 return false;
 
+  const bool HasMultipleUses = !MRI->hasOneNonDBGUse(Reg);
+
   assert(!DefMI.getOperand(0).getSubReg() && "Expected SSA form");
 
   unsigned Opc = UseMI.getOpcode();
@@ -3577,6 +3576,25 @@ bool SIInstrInfo::foldImmediate(MachineInstr &UseMI, 
MachineInstr &DefMI,
 
 const TargetRegisterClass *DstRC = RI.getRegClassForReg(*MRI, DstReg);
 
+if (HasMultipleUses) {
+  // TODO: This should fold in more cases with multiple use, but we need to
+  // more carefully consider what those uses are.
+  unsigned ImmDefSize = RI.getRegSizeInBits(*MRI->getRegClass(Reg));
+
+

[llvm-branch-commits] [flang] [flang][OpenMP] move omp end sections validation to semantics (PR #154740)

2025-08-21 Thread Krzysztof Parzyszek via llvm-branch-commits


@@ -4903,8 +4903,11 @@ struct OpenMPSectionsConstruct {
   CharBlock source;
   // Each of the OpenMPConstructs in the list below contains an
   // OpenMPSectionConstruct. This is guaranteed by the parser.
+  // The end sections directive is optional here because it is difficult to
+  // generate helpful error messages for a missing end directive wihtin the

kparzysz wrote:

Typo: within

https://github.com/llvm/llvm-project/pull/154740
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Analyzer] No longer crash with VLA operands to unary type traits (PR #154738)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/154738

sizeof was handled correctly, but __datasizeof and _Countof were not.

Fixes #151711

(cherry picked from commit 17327482f045b7119e116320db3e9c12fcf250ae with 
adjustments)
Dropping the ReleaseNotes part of the original patch.

The Static Analyzer release notes section will mention this patch in #154600

>From 656763c898bff7783d87ed7d17c3050c631fe06d Mon Sep 17 00:00:00 2001
From: Aaron Ballman 
Date: Fri, 1 Aug 2025 12:31:56 -0400
Subject: [PATCH] [Analyzer] No longer crash with VLA operands to unary type
 traits (#151719)

sizeof was handled correctly, but __datasizeof and _Countof were not.

Fixes #151711

(cherry picked from commit 17327482f045b7119e116320db3e9c12fcf250ae with 
adjustments)
Dropping the ReleaseNotes part of the original patch.
---
 clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp |  3 ++-
 clang/test/Analysis/engine/gh151711.cpp   | 18 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Analysis/engine/gh151711.cpp

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp 
b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
index fa8e669b6bb2f..ab29f86cec326 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
@@ -916,7 +916,8 @@ VisitUnaryExprOrTypeTraitExpr(const 
UnaryExprOrTypeTraitExpr *Ex,
   QualType T = Ex->getTypeOfArgument();
 
   for (ExplodedNode *N : CheckedSet) {
-if (Ex->getKind() == UETT_SizeOf) {
+if (Ex->getKind() == UETT_SizeOf || Ex->getKind() == UETT_DataSizeOf ||
+Ex->getKind() == UETT_CountOf) {
   if (!T->isIncompleteType() && !T->isConstantSizeType()) {
 assert(T->isVariableArrayType() && "Unknown non-constant-sized type.");
 
diff --git a/clang/test/Analysis/engine/gh151711.cpp 
b/clang/test/Analysis/engine/gh151711.cpp
new file mode 100644
index 0..a9950a7a3b9d0
--- /dev/null
+++ b/clang/test/Analysis/engine/gh151711.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify 
%s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify 
-x c %s
+
+void clang_analyzer_dump(int);
+
+// Ensure that VLA types are correctly handled by unary type traits in the
+// expression engine. Previously, __datasizeof and _Countof both caused failed
+// assertions.
+void gh151711(int i) {
+  clang_analyzer_dump(sizeof(int[i++]));   // expected-warning {{Unknown}}
+#ifdef __cplusplus
+  // __datasizeof is only available in C++.
+  clang_analyzer_dump(__datasizeof(int[i++])); // expected-warning {{Unknown}}
+#else
+  // _Countof is only available in C.
+  clang_analyzer_dump(_Countof(int[i++])); // expected-warning {{Unknown}}
+#endif
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Analyzer] No longer crash with VLA operands to unary type traits (PR #154738)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)


Changes

sizeof was handled correctly, but __datasizeof and _Countof were not.

Fixes #151711

(cherry picked from commit 17327482f045b7119e116320db3e9c12fcf250ae with 
adjustments)
Dropping the ReleaseNotes part of the original patch.

The Static Analyzer release notes section will mention this patch in #154600

---
Full diff: https://github.com/llvm/llvm-project/pull/154738.diff


2 Files Affected:

- (modified) clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp (+2-1) 
- (added) clang/test/Analysis/engine/gh151711.cpp (+18) 


``diff
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp 
b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
index fa8e669b6bb2f..ab29f86cec326 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp
@@ -916,7 +916,8 @@ VisitUnaryExprOrTypeTraitExpr(const 
UnaryExprOrTypeTraitExpr *Ex,
   QualType T = Ex->getTypeOfArgument();
 
   for (ExplodedNode *N : CheckedSet) {
-if (Ex->getKind() == UETT_SizeOf) {
+if (Ex->getKind() == UETT_SizeOf || Ex->getKind() == UETT_DataSizeOf ||
+Ex->getKind() == UETT_CountOf) {
   if (!T->isIncompleteType() && !T->isConstantSizeType()) {
 assert(T->isVariableArrayType() && "Unknown non-constant-sized type.");
 
diff --git a/clang/test/Analysis/engine/gh151711.cpp 
b/clang/test/Analysis/engine/gh151711.cpp
new file mode 100644
index 0..a9950a7a3b9d0
--- /dev/null
+++ b/clang/test/Analysis/engine/gh151711.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify 
%s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify 
-x c %s
+
+void clang_analyzer_dump(int);
+
+// Ensure that VLA types are correctly handled by unary type traits in the
+// expression engine. Previously, __datasizeof and _Countof both caused failed
+// assertions.
+void gh151711(int i) {
+  clang_analyzer_dump(sizeof(int[i++]));   // expected-warning {{Unknown}}
+#ifdef __cplusplus
+  // __datasizeof is only available in C++.
+  clang_analyzer_dump(__datasizeof(int[i++])); // expected-warning {{Unknown}}
+#else
+  // _Countof is only available in C.
+  clang_analyzer_dump(_Countof(int[i++])); // expected-warning {{Unknown}}
+#endif
+}

``




https://github.com/llvm/llvm-project/pull/154738
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for unspilling VGPRs after MFMA rewrite (PR #154322)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/154322

>From 883e110c8f86719a810c4d5a1930434af532194c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 19 Aug 2025 21:29:05 +0900
Subject: [PATCH] AMDGPU: Add baseline test for unspilling VGPRs after MFMA
 rewrite

Test for #154260
---
 .../unspill-vgpr-after-rewrite-vgpr-mfma.ll   | 454 ++
 1 file changed, 454 insertions(+)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll

diff --git a/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll 
b/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll
new file mode 100644
index 0..122d46b39ff32
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll
@@ -0,0 +1,454 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -amdgpu-mfma-vgpr-form < %s 
| FileCheck %s
+
+; After reassigning the MFMA to use AGPRs, we've alleviated enough
+; register pressure to try eliminating the spill of %spill with the freed
+; up VGPR.
+define void @eliminate_spill_after_mfma_rewrite(i32 %x, i32 %y, <4 x i32> 
%arg, ptr addrspace(1) inreg %ptr) #0 {
+; CHECK-LABEL: eliminate_spill_after_mfma_rewrite:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:v_accvgpr_write_b32 a3, v5
+; CHECK-NEXT:v_accvgpr_write_b32 a2, v4
+; CHECK-NEXT:v_accvgpr_write_b32 a1, v3
+; CHECK-NEXT:v_accvgpr_write_b32 a0, v2
+; CHECK-NEXT:buffer_store_dword v40, off, s[0:3], s32 offset:188 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v41, off, s[0:3], s32 offset:184 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v42, off, s[0:3], s32 offset:180 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v43, off, s[0:3], s32 offset:176 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v44, off, s[0:3], s32 offset:172 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v45, off, s[0:3], s32 offset:168 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v46, off, s[0:3], s32 offset:164 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v47, off, s[0:3], s32 offset:160 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v56, off, s[0:3], s32 offset:156 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v57, off, s[0:3], s32 offset:152 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v58, off, s[0:3], s32 offset:148 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v59, off, s[0:3], s32 offset:144 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v60, off, s[0:3], s32 offset:140 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v61, off, s[0:3], s32 offset:136 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v62, off, s[0:3], s32 offset:132 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v63, off, s[0:3], s32 offset:128 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a32, off, s[0:3], s32 offset:124 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a33, off, s[0:3], s32 offset:120 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a34, off, s[0:3], s32 offset:116 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a35, off, s[0:3], s32 offset:112 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a36, off, s[0:3], s32 offset:108 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a37, off, s[0:3], s32 offset:104 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a38, off, s[0:3], s32 offset:100 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a39, off, s[0:3], s32 offset:96 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a40, off, s[0:3], s32 offset:92 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a41, off, s[0:3], s32 offset:88 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a42, off, s[0:3], s32 offset:84 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a43, off, s[0:3], s32 offset:80 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a44, off, s[0:3], s32 offset:76 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a45, off, s[0:3], s32 offset:72 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a46, off, s[0:3], s32 offset:68 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a47, off, s[0:3], s32 offset:64 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a48, off, s[0:3], s32 offset:60 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a49, off, s[0:3], s32 offset:56 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a50, off, s[0:3], s32 offset:52 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a51, off, s[0:3], s32 offset:48 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a52, off, s[0:3], 

[llvm-branch-commits] [clang] d72ad36 - Revert "[C++] Expose nullptr_t from stddef.h in C++ mode (#154599)"

2025-08-21 Thread via llvm-branch-commits

Author: Aaron Ballman
Date: 2025-08-21T10:01:23-04:00
New Revision: d72ad36f407f28cf8b3f23f3a8bdb835eb7b8776

URL: 
https://github.com/llvm/llvm-project/commit/d72ad36f407f28cf8b3f23f3a8bdb835eb7b8776
DIFF: 
https://github.com/llvm/llvm-project/commit/d72ad36f407f28cf8b3f23f3a8bdb835eb7b8776.diff

LOG: Revert "[C++] Expose nullptr_t from stddef.h in C++ mode (#154599)"

This reverts commit 7d167f45643b37a627e2aef49f718a5a2debd5d3.

Added: 


Modified: 
clang/docs/ReleaseNotes.rst
clang/lib/Headers/__stddef_nullptr_t.h
clang/test/Headers/stddefneeds.cpp

Removed: 




diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index c32102d102cd3..fe1dd15c6f885 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -229,7 +229,6 @@ Bug Fixes in This Version
   cast chain. (#GH149967).
 - Fixed a crash with incompatible pointer to integer conversions in designated
   initializers involving string literals. (#GH154046)
-- Clang's  now properly declares ``nullptr_t`` in C++ mode. 
(#GH154577).
 
 Bug Fixes to Compiler Builtins
 ^^

diff  --git a/clang/lib/Headers/__stddef_nullptr_t.h 
b/clang/lib/Headers/__stddef_nullptr_t.h
index c84b7bc2dc198..7f3fbe6fe0d3a 100644
--- a/clang/lib/Headers/__stddef_nullptr_t.h
+++ b/clang/lib/Headers/__stddef_nullptr_t.h
@@ -16,8 +16,7 @@
 #define _NULLPTR_T
 
 #ifdef __cplusplus
-#if __cplusplus >= 201103L ||  
\
-(defined(_MSC_EXTENSIONS) && defined(_NATIVE_NULLPTR_SUPPORTED))
+#if defined(_MSC_EXTENSIONS) && defined(_NATIVE_NULLPTR_SUPPORTED)
 namespace std {
 typedef decltype(nullptr) nullptr_t;
 }

diff  --git a/clang/test/Headers/stddefneeds.cpp 
b/clang/test/Headers/stddefneeds.cpp
index 6e8829bc1be67..0282e8afa600d 100644
--- a/clang/test/Headers/stddefneeds.cpp
+++ b/clang/test/Headers/stddefneeds.cpp
@@ -1,12 +1,10 @@
 // RUN: %clang_cc1 -fsyntax-only -triple x86_64-apple-macosx10.9.0 -verify 
-Wsentinel -std=c++11 %s
-// RUN: %clang_cc1 -fsyntax-only -triple x86_64-apple-macosx10.9.0 
-verify=old,expected -Wsentinel -std=c++98 %s
 
 ptr
diff _t p0; // expected-error{{unknown}}
 size_t s0; // expected-error{{unknown}}
 void* v0 = NULL; // expected-error{{undeclared}}
 wint_t w0; // expected-error{{unknown}}
 max_align_t m0; // expected-error{{unknown}}
-nullptr_t n0; // expected-error {{unknown}}
 
 #define __need_ptr
diff _t
 #include 
@@ -16,7 +14,6 @@ size_t s1; // expected-error{{unknown}}
 void* v1 = NULL; // expected-error{{undeclared}}
 wint_t w1; // expected-error{{unknown}}
 max_align_t m1; // expected-error{{unknown}}
-nullptr_t n1; // expected-error{{unknown}}
 
 #define __need_size_t
 #include 
@@ -26,16 +23,6 @@ size_t s2;
 void* v2 = NULL; // expected-error{{undeclared}}
 wint_t w2; // expected-error{{unknown}}
 max_align_t m2; // expected-error{{unknown}}
-nullptr_t n2; // expected-error{{unknown}}
-
-#define __need_nullptr_t
-#include 
-ptr
diff _t p6;
-size_t s6;
-void* v6 = NULL; // expected-error{{undeclared}}
-wint_t w6; // expected-error{{unknown}}
-max_align_t m6; // expected-error{{unknown}}
-nullptr_t n6; // old-error{{unknown}}
 
 #define __need_NULL
 #include 
@@ -45,16 +32,6 @@ size_t s3;
 void* v3 = NULL;
 wint_t w3; // expected-error{{unknown}}
 max_align_t m3; // expected-error{{unknown}}
-nullptr_t n3; // old-error{{unknown}}
-
-#define __need_max_align_t
-#include 
-ptr
diff _t p7;
-size_t s7;
-void* v7 = NULL;
-wint_t w7; // expected-error{{unknown}}
-max_align_t m7;
-nullptr_t n7; // old-error{{unknown}}
 
 // Shouldn't bring in wint_t by default:
 #include 
@@ -64,7 +41,6 @@ size_t s4;
 void* v4 = NULL;
 wint_t w4; // expected-error{{unknown}}
 max_align_t m4;
-nullptr_t n4; // old-error{{unknown}}
 
 #define __need_wint_t
 #include 
@@ -74,7 +50,7 @@ size_t s5;
 void* v5 = NULL;
 wint_t w5;
 max_align_t m5;
-nullptr_t n5; // old-error{{unknown}}
+
 
 // linux/stddef.h does something like this for cpp files:
 #undef NULL



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][gfx1250] Implement SIMemoryLegalizer (PR #154726)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits


@@ -587,7 +587,11 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
 
 public:
-  SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {}
+  SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+// GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
+// the behavior is the same if assuming GFX12.0 in CU mode.
+assert(ST.hasGFX1250Insts() ? ST.isCuModeEnabled() : true);

arsenm wrote:

```suggestion
assert(ST.hasGFX1250Insts() || ST.isCuModeEnabled());
```

https://github.com/llvm/llvm-project/pull/154726
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

steakhal wrote:

This PR is blocked by #154608

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Split large loop dependence masks (PR #153187)

2025-08-21 Thread Sander de Smalen via llvm-branch-commits


@@ -5248,49 +5248,94 @@ 
AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
  SelectionDAG &DAG) const {
   SDLoc DL(Op);
   uint64_t EltSize = Op.getConstantOperandVal(2);
-  EVT VT = Op.getValueType();
+  EVT FullVT = Op.getValueType();
+  unsigned NumElements = FullVT.getVectorMinNumElements();
+  unsigned NumSplits = 0;
+  EVT EltVT;
   switch (EltSize) {
   case 1:
-if (VT != MVT::v16i8 && VT != MVT::nxv16i1)
-  return SDValue();
+EltVT = MVT::i8;
 break;
   case 2:
-if (VT != MVT::v8i8 && VT != MVT::nxv8i1)
-  return SDValue();
+if (NumElements >= 16)
+  NumSplits = NumElements / 16;
+EltVT = MVT::i16;
 break;
   case 4:
-if (VT != MVT::v4i16 && VT != MVT::nxv4i1)
-  return SDValue();
+if (NumElements >= 8)
+  NumSplits = NumElements / 8;
+EltVT = MVT::i32;
 break;
   case 8:
-if (VT != MVT::v2i32 && VT != MVT::nxv2i1)
-  return SDValue();
+if (NumElements >= 4)
+  NumSplits = NumElements / 4;
+EltVT = MVT::i64;
 break;
   default:
 // Other element sizes are incompatible with whilewr/rw, so expand instead
 return SDValue();
   }
 
-  SDValue PtrA = Op.getOperand(0);
-  SDValue PtrB = Op.getOperand(1);
+  auto LowerToWhile = [&](EVT VT, unsigned AddrScale) {
+SDValue PtrA = Op.getOperand(0);
+SDValue PtrB = Op.getOperand(1);
 
-  if (VT.isScalableVT())
-return DAG.getNode(Op.getOpcode(), DL, VT, PtrA, PtrB, Op.getOperand(2));
+EVT StoreVT = EVT::getVectorVT(*DAG.getContext(), EltVT,
+   VT.getVectorMinNumElements(), false);
+if (AddrScale > 0) {
+  unsigned Offset = StoreVT.getStoreSizeInBits() / 8 * AddrScale;
+  SDValue Addend;
 
-  // We can use the SVE whilewr/whilerw instruction to lower this
-  // intrinsic by creating the appropriate sequence of scalable vector
-  // operations and then extracting a fixed-width subvector from the scalable
-  // vector. Scalable vector variants are already legal.
-  EVT ContainerVT =
-  EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
-   VT.getVectorNumElements(), true);
-  EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+  if (VT.isScalableVT())
+Addend = DAG.getVScale(DL, MVT::i64, APInt(64, Offset));
+  else
+Addend = DAG.getConstant(Offset, DL, MVT::i64);
 
-  SDValue Mask =
-  DAG.getNode(Op.getOpcode(), DL, WhileVT, PtrA, PtrB, Op.getOperand(2));
-  SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
-  return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
- DAG.getVectorIdxConstant(0, DL));
+  PtrA = DAG.getNode(ISD::ADD, DL, MVT::i64, PtrA, Addend);

sdesmalen-arm wrote:

As I had already pointed out in the other review, it is wrong to increment 
`PtrB`.

https://github.com/llvm/llvm-project/pull/153187
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Split large loop dependence masks (PR #153187)

2025-08-21 Thread Sander de Smalen via llvm-branch-commits


@@ -5248,49 +5248,94 @@ 
AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
  SelectionDAG &DAG) const {
   SDLoc DL(Op);
   uint64_t EltSize = Op.getConstantOperandVal(2);
-  EVT VT = Op.getValueType();
+  EVT FullVT = Op.getValueType();
+  unsigned NumElements = FullVT.getVectorMinNumElements();
+  unsigned NumSplits = 0;
+  EVT EltVT;
   switch (EltSize) {
   case 1:
-if (VT != MVT::v16i8 && VT != MVT::nxv16i1)
-  return SDValue();
+EltVT = MVT::i8;
 break;
   case 2:
-if (VT != MVT::v8i8 && VT != MVT::nxv8i1)
-  return SDValue();
+if (NumElements >= 16)
+  NumSplits = NumElements / 16;
+EltVT = MVT::i16;
 break;
   case 4:
-if (VT != MVT::v4i16 && VT != MVT::nxv4i1)
-  return SDValue();
+if (NumElements >= 8)
+  NumSplits = NumElements / 8;
+EltVT = MVT::i32;
 break;
   case 8:
-if (VT != MVT::v2i32 && VT != MVT::nxv2i1)
-  return SDValue();
+if (NumElements >= 4)
+  NumSplits = NumElements / 4;
+EltVT = MVT::i64;
 break;
   default:
 // Other element sizes are incompatible with whilewr/rw, so expand instead
 return SDValue();
   }
 
-  SDValue PtrA = Op.getOperand(0);
-  SDValue PtrB = Op.getOperand(1);
+  auto LowerToWhile = [&](EVT VT, unsigned AddrScale) {
+SDValue PtrA = Op.getOperand(0);
+SDValue PtrB = Op.getOperand(1);
 
-  if (VT.isScalableVT())
-return DAG.getNode(Op.getOpcode(), DL, VT, PtrA, PtrB, Op.getOperand(2));
+EVT StoreVT = EVT::getVectorVT(*DAG.getContext(), EltVT,
+   VT.getVectorMinNumElements(), false);
+if (AddrScale > 0) {
+  unsigned Offset = StoreVT.getStoreSizeInBits() / 8 * AddrScale;
+  SDValue Addend;
 
-  // We can use the SVE whilewr/whilerw instruction to lower this
-  // intrinsic by creating the appropriate sequence of scalable vector
-  // operations and then extracting a fixed-width subvector from the scalable
-  // vector. Scalable vector variants are already legal.
-  EVT ContainerVT =
-  EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
-   VT.getVectorNumElements(), true);
-  EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+  if (VT.isScalableVT())
+Addend = DAG.getVScale(DL, MVT::i64, APInt(64, Offset));
+  else
+Addend = DAG.getConstant(Offset, DL, MVT::i64);
 
-  SDValue Mask =
-  DAG.getNode(Op.getOpcode(), DL, WhileVT, PtrA, PtrB, Op.getOperand(2));
-  SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
-  return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
- DAG.getVectorIdxConstant(0, DL));
+  PtrA = DAG.getNode(ISD::ADD, DL, MVT::i64, PtrA, Addend);
+  PtrB = DAG.getNode(ISD::ADD, DL, MVT::i64, PtrB, Addend);
+}
+
+if (VT.isScalableVT())
+  return DAG.getNode(Op.getOpcode(), DL, VT, PtrA, PtrB, Op.getOperand(2));
+
+// We can use the SVE whilewr/whilerw instruction to lower this
+// intrinsic by creating the appropriate sequence of scalable vector
+// operations and then extracting a fixed-width subvector from the scalable
+// vector. Scalable vector variants are already legal.
+EVT ContainerVT =
+EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
+ VT.getVectorNumElements(), true);
+EVT WhileVT = ContainerVT.changeElementType(MVT::i1);
+
+SDValue Mask =
+DAG.getNode(Op.getOpcode(), DL, WhileVT, PtrA, PtrB, Op.getOperand(2));
+SDValue MaskAsInt = DAG.getNode(ISD::SIGN_EXTEND, DL, ContainerVT, Mask);
+return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, MaskAsInt,
+   DAG.getVectorIdxConstant(0, DL));
+  };
+
+  if (NumSplits == 0)
+return LowerToWhile(FullVT, 0);
+
+  SDValue FullVec = DAG.getUNDEF(FullVT);
+
+  unsigned NumElementsPerSplit = NumElements / (2 * NumSplits);
+  EVT PartVT =
+  EVT::getVectorVT(*DAG.getContext(), FullVT.getVectorElementType(),
+   NumElementsPerSplit, FullVT.isScalableVT());
+  for (unsigned Split = 0, InsertIdx = 0; Split < NumSplits;

sdesmalen-arm wrote:

Rather than using a loop, it seems simpler to just split the operation in two, 
calculate the Lo and Hi halves, and then concatenate them together. They will 
be recursively split, if necessary.

https://github.com/llvm/llvm-project/pull/153187
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Split large loop dependence masks (PR #153187)

2025-08-21 Thread Sander de Smalen via llvm-branch-commits


@@ -5248,49 +5248,94 @@ 
AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
  SelectionDAG &DAG) const {
   SDLoc DL(Op);
   uint64_t EltSize = Op.getConstantOperandVal(2);
-  EVT VT = Op.getValueType();
+  EVT FullVT = Op.getValueType();
+  unsigned NumElements = FullVT.getVectorMinNumElements();
+  unsigned NumSplits = 0;
+  EVT EltVT;
   switch (EltSize) {
   case 1:
-if (VT != MVT::v16i8 && VT != MVT::nxv16i1)
-  return SDValue();
+EltVT = MVT::i8;
 break;
   case 2:
-if (VT != MVT::v8i8 && VT != MVT::nxv8i1)
-  return SDValue();
+if (NumElements >= 16)

sdesmalen-arm wrote:

Note that this case can quite easily be implemented with some patterns that do 
the operation returning a wider predicate type (i.e. using a smaller element 
size), and then extracting the sub-vector.

https://github.com/llvm/llvm-project/pull/153187
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Split large loop dependence masks (PR #153187)

2025-08-21 Thread Sander de Smalen via llvm-branch-commits


@@ -5248,49 +5248,94 @@ 
AArch64TargetLowering::LowerLOOP_DEPENDENCE_MASK(SDValue Op,
  SelectionDAG &DAG) const {
   SDLoc DL(Op);
   uint64_t EltSize = Op.getConstantOperandVal(2);
-  EVT VT = Op.getValueType();
+  EVT FullVT = Op.getValueType();
+  unsigned NumElements = FullVT.getVectorMinNumElements();
+  unsigned NumSplits = 0;
+  EVT EltVT;
   switch (EltSize) {
   case 1:
-if (VT != MVT::v16i8 && VT != MVT::nxv16i1)
-  return SDValue();
+EltVT = MVT::i8;
 break;
   case 2:
-if (VT != MVT::v8i8 && VT != MVT::nxv8i1)
-  return SDValue();
+if (NumElements >= 16)
+  NumSplits = NumElements / 16;
+EltVT = MVT::i16;
 break;
   case 4:
-if (VT != MVT::v4i16 && VT != MVT::nxv4i1)
-  return SDValue();
+if (NumElements >= 8)
+  NumSplits = NumElements / 8;
+EltVT = MVT::i32;
 break;
   case 8:
-if (VT != MVT::v2i32 && VT != MVT::nxv2i1)
-  return SDValue();
+if (NumElements >= 4)
+  NumSplits = NumElements / 4;
+EltVT = MVT::i64;
 break;
   default:
 // Other element sizes are incompatible with whilewr/rw, so expand instead
 return SDValue();
   }
 
-  SDValue PtrA = Op.getOperand(0);
-  SDValue PtrB = Op.getOperand(1);
+  auto LowerToWhile = [&](EVT VT, unsigned AddrScale) {
+SDValue PtrA = Op.getOperand(0);
+SDValue PtrB = Op.getOperand(1);
 
-  if (VT.isScalableVT())
-return DAG.getNode(Op.getOpcode(), DL, VT, PtrA, PtrB, Op.getOperand(2));
+EVT StoreVT = EVT::getVectorVT(*DAG.getContext(), EltVT,
+   VT.getVectorMinNumElements(), false);

sdesmalen-arm wrote:

It is not necessary to create StoreVT, you can do `unsigned Offset = 
VT.getVectorMinNumElements() * EltSize * AddrScale`.

https://github.com/llvm/llvm-project/pull/153187
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add test for mfma rewrite pass respecting optnone (PR #153025)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153025

>From d53062ac44d3770b2f4e3f993bfed0b26294200d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:05:44 +0900
Subject: [PATCH] AMDGPU: Add test for mfma rewrite pass respecting optnone

---
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr.ll   | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
index 343a5c8511ee9..6f7809f46d10a 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
@@ -3,6 +3,40 @@
 
 target triple = "amdgcn-amd-amdhsa"
 
+define amdgpu_kernel void @respect_optnone(double %arg0, double %arg1, ptr 
addrspace(1) %ptr) #4 {
+; CHECK-LABEL: respect_optnone:
+; CHECK:   ; %bb.0: ; %bb
+; CHECK-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:s_load_dwordx2 s[2:3], s[4:5], 0x8
+; CHECK-NEXT:s_nop 0
+; CHECK-NEXT:s_load_dwordx2 s[4:5], s[4:5], 0x10
+; CHECK-NEXT:s_mov_b32 s6, 0x3ff
+; CHECK-NEXT:v_and_b32_e64 v0, v0, s6
+; CHECK-NEXT:s_mov_b32 s6, 3
+; CHECK-NEXT:v_lshlrev_b32_e64 v0, s6, v0
+; CHECK-NEXT:s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx2 v[0:1], v0, s[4:5]
+; CHECK-NEXT:v_mov_b64_e32 v[2:3], s[0:1]
+; CHECK-NEXT:v_mov_b64_e32 v[4:5], s[2:3]
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:s_nop 0
+; CHECK-NEXT:v_mfma_f64_4x4x4_4b_f64 v[0:1], v[2:3], v[4:5], v[0:1]
+; CHECK-NEXT:s_nop 5
+; CHECK-NEXT:v_accvgpr_write_b32 a0, v0
+; CHECK-NEXT:v_accvgpr_write_b32 a1, v1
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:1]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_endpgm
+bb:
+  %id = call i32 @llvm.amdgcn.workitem.id.x()
+  %gep = getelementptr double, ptr addrspace(1) %ptr, i32 %id
+  %src2 = load double, ptr addrspace(1) %gep
+  %mai = call double @llvm.amdgcn.mfma.f64.4x4x4f64(double %arg0, double 
%arg1, double %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(double %mai)
+  ret void
+}
+
 define amdgpu_kernel void @test_mfma_f32_32x32x1f32_rewrite_vgpr_mfma(ptr 
addrspace(1) %arg) #0 {
 ; CHECK-LABEL: test_mfma_f32_32x32x1f32_rewrite_vgpr_mfma:
 ; CHECK:   ; %bb.0: ; %bb
@@ -859,3 +893,4 @@ attributes #0 = { nounwind 
"amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-p
 attributes #1 = { mustprogress nofree norecurse nounwind willreturn 
"amdgpu-waves-per-eu"="8,8" }
 attributes #2 = { convergent nocallback nofree nosync nounwind willreturn 
memory(none) }
 attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn 
memory(none) }
+attributes #4 = { nounwind noinline optnone }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [mlir] [OMPIRBuilder] Add support for explicit deallocation points (PR #154752)

2025-08-21 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- 
clang/lib/CodeGen/CGOpenMPRuntime.cpp clang/lib/CodeGen/CGStmtOpenMP.cpp 
llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
llvm/include/llvm/Transforms/Utils/CodeExtractor.h 
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
llvm/lib/Transforms/IPO/HotColdSplitting.cpp 
llvm/lib/Transforms/IPO/IROutliner.cpp llvm/lib/Transforms/IPO/OpenMPOpt.cpp 
llvm/lib/Transforms/Utils/CodeExtractor.cpp 
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp 
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp 
b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 959c66593..c9ac20715 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -4365,20 +4365,20 @@ void CodeGenFunction::EmitOMPSectionsDirective(const 
OMPSectionsDirective &S) {
 auto SectionCB = [this, SubStmt](InsertPointTy AllocIP,
  InsertPointTy CodeGenIP,
  ArrayRef DeallocIPs) {
-  OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
-  *this, SubStmt, AllocIP, CodeGenIP, "section");
+  OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(*this, SubStmt, 
AllocIP,
+CodeGenIP, "section");
   return llvm::Error::success();
 };
 SectionCBVector.push_back(SectionCB);
   }
 } else {
-  auto SectionCB = [this, CapturedStmt](
-   InsertPointTy AllocIP, InsertPointTy CodeGenIP,
-   ArrayRef DeallocIPs) {
-OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
-*this, CapturedStmt, AllocIP, CodeGenIP, "section");
-return llvm::Error::success();
-  };
+  auto SectionCB =
+  [this, CapturedStmt](InsertPointTy AllocIP, InsertPointTy CodeGenIP,
+   ArrayRef DeallocIPs) {
+OMPBuilderCBHelpers::EmitOMPInlinedRegionBody(
+*this, CapturedStmt, AllocIP, CodeGenIP, "section");
+return llvm::Error::success();
+  };
   SectionCBVector.push_back(SectionCB);
 }
 
diff --git a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp 
b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
index 68f1365b6..9e9f943aa 100644
--- a/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
+++ b/llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
@@ -55,9 +55,9 @@ using namespace omp;
   }
 
 #define BODYGENCB_WRAPPER(cb)  
\
-  [&cb](InsertPointTy AllocIP, InsertPointTy CodeGenIP,   \
+  [&cb](InsertPointTy AllocIP, InsertPointTy CodeGenIP,
\
 ArrayRef DeallocIPs) -> Error { 
\
-cb(AllocIP, CodeGenIP, DeallocIPs);   \
+cb(AllocIP, CodeGenIP, DeallocIPs);
\
 return Error::success();   
\
   }
 
@@ -922,8 +922,8 @@ TEST_F(OpenMPIRBuilderTest, ParallelNested) {
 
 ASSERT_EXPECTED_INIT(
 OpenMPIRBuilder::InsertPointTy, AfterIP,
-OMPBuilder.createParallel(InsertPointTy(CGBB, CGBB->end()), AllocIP,
-  {}, InnerBodyGenCB, PrivCB, FiniCB, nullptr,
+OMPBuilder.createParallel(InsertPointTy(CGBB, CGBB->end()), AllocIP, 
{},
+  InnerBodyGenCB, PrivCB, FiniCB, nullptr,
   nullptr, OMP_PROC_BIND_default, false));
 
 Builder.restoreIP(AfterIP);
@@ -1029,18 +1029,18 @@ TEST_F(OpenMPIRBuilderTest, ParallelNested2Inner) {
 
 ASSERT_EXPECTED_INIT(
 OpenMPIRBuilder::InsertPointTy, AfterIP1,
-OMPBuilder.createParallel(InsertPointTy(CGBB, CGBB->end()), AllocIP,
-  {}, InnerBodyGenCB, PrivCB, FiniCB, nullptr,
+OMPBuilder.createParallel(InsertPointTy(CGBB, CGBB->end()), AllocIP, 
{},
+  InnerBodyGenCB, PrivCB, FiniCB, nullptr,
   nullptr, OMP_PROC_BIND_default, false));
 
 Builder.restoreIP(AfterIP1);
 Builder.CreateBr(NewBB1);
 
-ASSERT_EXPECTED_INIT(OpenMPIRBuilder::InsertPointTy, AfterIP2,
- OMPBuilder.createParallel(
- InsertPointTy(NewBB1, NewBB1->end()), AllocIP, {},
- InnerBodyGenCB, PrivCB, FiniCB, nullptr, nullptr,
- OMP_PROC_BIND_default, false));
+ASSERT_EXPECTED_INIT(
+OpenMPIRBuilder::Inser

[llvm-branch-commits] [clang] [llvm] [mlir] [OMPIRBuilder] Add support for explicit deallocation points (PR #154752)

2025-08-21 Thread Sergio Afonso via llvm-branch-commits

skatrak wrote:

PR stack:
- #150922
- #150923
- #150924
- #150925
- #150926
- #150927
- #154752 ◀️

https://github.com/llvm/llvm-project/pull/154752
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

> > Dammit, yeah I wish I could still see the conflict to work out how I 
> > misread it - I suspect the addition of "A new flag - 
> > `-static-libclosure`..." to new flags didn't conflict :-/
> 
> You can actually, `git show --remerge-diff 
> 30401b1f918ea359334b507a79118938ffe3c169` 
> ([docs](https://git-scm.com/docs/git-show#Documentation/git-show.txt---remerge-diff)).
>  :)

I think this was a pebkac issue with me using the GH editor to resolve the diff 
- if I merge locally the change seems obvious, and I can't think of a reason to 
make this error. 

https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add statistic for number of MFMAs moved to AGPR form (PR #153024)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153024

>From e9fb98098503bbb660159e9eac1b6a6d5a5029c5 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:00:54 +0900
Subject: [PATCH] AMDGPU: Add statistic for number of MFMAs moved to AGPR form

---
 llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..78bd104ed2514 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -26,6 +26,7 @@
 #include "GCNSubtarget.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIRegisterInfo.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/LiveRegMatrix.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
@@ -38,6 +39,9 @@ using namespace llvm;
 
 namespace {
 
+STATISTIC(NumMFMAsRewrittenToAGPR,
+  "Number of MFMA instructions rewritten to use AGPR form");
+
 class AMDGPURewriteAGPRCopyMFMAImpl {
   MachineFunction &MF;
   const GCNSubtarget &ST;
@@ -263,6 +267,7 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryReassigningMFMAChain(
 int NewMFMAOp =
 AMDGPU::getMFMASrcCVDstAGPROp(RewriteCandidate->getOpcode());
 RewriteCandidate->setDesc(TII.get(NewMFMAOp));
+++NumMFMAsRewrittenToAGPR;
   }
 
   return true;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add statistic for number of MFMAs moved to AGPR form (PR #153024)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153024

>From e9fb98098503bbb660159e9eac1b6a6d5a5029c5 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:00:54 +0900
Subject: [PATCH] AMDGPU: Add statistic for number of MFMAs moved to AGPR form

---
 llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..78bd104ed2514 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -26,6 +26,7 @@
 #include "GCNSubtarget.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIRegisterInfo.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/LiveRegMatrix.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
@@ -38,6 +39,9 @@ using namespace llvm;
 
 namespace {
 
+STATISTIC(NumMFMAsRewrittenToAGPR,
+  "Number of MFMA instructions rewritten to use AGPR form");
+
 class AMDGPURewriteAGPRCopyMFMAImpl {
   MachineFunction &MF;
   const GCNSubtarget &ST;
@@ -263,6 +267,7 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryReassigningMFMAChain(
 int NewMFMAOp =
 AMDGPU::getMFMASrcCVDstAGPROp(RewriteCandidate->getOpcode());
 RewriteCandidate->setDesc(TII.get(NewMFMAOp));
+++NumMFMAsRewrittenToAGPR;
   }
 
   return true;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add tests for every mfma intrinsic v-to-a mapping (PR #153026)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153026

>From bc0f27ae4ec17649dff589e1dc3468b8ad0f4e45 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:12:49 +0900
Subject: [PATCH 1/2] AMDGPU: Add tests for every mfma intrinsic v-to-a mapping

Make sure the MFMA VGPR to AGPR InstrMapping table is complete.
I think I got everything, except the full cross product of input
types with the mfma scale intrinsics. Also makes sure we have
coverage for smfmac and mfma_scale cases.
---
 .../rewrite-vgpr-mfma-to-agpr.gfx90a.ll   | 141 +++
 .../rewrite-vgpr-mfma-to-agpr.gfx950.ll   | 664 ++
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr.ll   | 867 ++
 3 files changed, 1672 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx950.ll

diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
new file mode 100644
index 0..7d00b12e7334a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mcpu=gfx90a -amdgpu-mfma-vgpr-form < %s | FileCheck %s
+
+target triple = "amdgcn-amd-amdhsa"
+
+define void @test_rewrite_mfma_i32_32x32x8i8(i32 %arg0, i32 %arg1, ptr 
addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_i32_32x32x8i8:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_i32_32x32x8i8 a[0:15], v0, v1, a[0:15]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:15]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <16 x i32>, ptr addrspace(1) %ptr
+  %mai = call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x8i8(i32 %arg0, i32 %arg1, 
<16 x i32> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<16 x i32> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_i32_16x16x16i8(i32 %arg0, i32 %arg1, ptr 
addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_i32_16x16x16i8:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_i32_16x16x16i8 a[0:3], v0, v1, a[0:3]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:3]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <4 x i32>, ptr addrspace(1) %ptr
+  %mai = call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x16i8(i32 %arg0, i32 %arg1, 
<4 x i32> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<4 x i32> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_f32_32x32x2bf16(<2 x i16> %arg0, <2 x i16> 
%arg1, ptr addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_f32_32x32x2bf16:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[28:31], v[2:3], off offset:112
+; CHECK-NEXT:global_load_dwordx4 a[24:27], v[2:3], off offset:96
+; CHECK-NEXT:global_load_dwordx4 a[20:23], v[2:3], off offset:80
+; CHECK-NEXT:global_load_dwordx4 a[16:19], v[2:3], off offset:64
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_f32_32x32x2bf16 a[0:31], v0, v1, a[0:31]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:31]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <32 x float>, ptr addrspace(1) %ptr
+  %mai = call <32 x float> @llvm.amdgcn.mfma.f32.32x32x2bf16(<2 x i16> %arg0, 
<2 x i16> %arg1, <32 x float> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<32 x float> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_f32_16x16x2bf16(<2 x i16> %arg0, <2 x i16> 
%arg1, ptr addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_f32_16x16x2bf16:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcn

[llvm-branch-commits] [llvm] AMDGPU: Add tests for every mfma intrinsic v-to-a mapping (PR #153026)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153026

>From bc0f27ae4ec17649dff589e1dc3468b8ad0f4e45 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:12:49 +0900
Subject: [PATCH 1/2] AMDGPU: Add tests for every mfma intrinsic v-to-a mapping

Make sure the MFMA VGPR to AGPR InstrMapping table is complete.
I think I got everything, except the full cross product of input
types with the mfma scale intrinsics. Also makes sure we have
coverage for smfmac and mfma_scale cases.
---
 .../rewrite-vgpr-mfma-to-agpr.gfx90a.ll   | 141 +++
 .../rewrite-vgpr-mfma-to-agpr.gfx950.ll   | 664 ++
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr.ll   | 867 ++
 3 files changed, 1672 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx950.ll

diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
new file mode 100644
index 0..7d00b12e7334a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.gfx90a.ll
@@ -0,0 +1,141 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mcpu=gfx90a -amdgpu-mfma-vgpr-form < %s | FileCheck %s
+
+target triple = "amdgcn-amd-amdhsa"
+
+define void @test_rewrite_mfma_i32_32x32x8i8(i32 %arg0, i32 %arg1, ptr 
addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_i32_32x32x8i8:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_i32_32x32x8i8 a[0:15], v0, v1, a[0:15]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:15]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <16 x i32>, ptr addrspace(1) %ptr
+  %mai = call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x8i8(i32 %arg0, i32 %arg1, 
<16 x i32> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<16 x i32> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_i32_16x16x16i8(i32 %arg0, i32 %arg1, ptr 
addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_i32_16x16x16i8:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_i32_16x16x16i8 a[0:3], v0, v1, a[0:3]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:3]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <4 x i32>, ptr addrspace(1) %ptr
+  %mai = call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x16i8(i32 %arg0, i32 %arg1, 
<4 x i32> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<4 x i32> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_f32_32x32x2bf16(<2 x i16> %arg0, <2 x i16> 
%arg1, ptr addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_f32_32x32x2bf16:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[28:31], v[2:3], off offset:112
+; CHECK-NEXT:global_load_dwordx4 a[24:27], v[2:3], off offset:96
+; CHECK-NEXT:global_load_dwordx4 a[20:23], v[2:3], off offset:80
+; CHECK-NEXT:global_load_dwordx4 a[16:19], v[2:3], off offset:64
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:v_mfma_f32_32x32x2bf16 a[0:31], v0, v1, a[0:31]
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:31]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_setpc_b64 s[30:31]
+  %src2 = load <32 x float>, ptr addrspace(1) %ptr
+  %mai = call <32 x float> @llvm.amdgcn.mfma.f32.32x32x2bf16(<2 x i16> %arg0, 
<2 x i16> %arg1, <32 x float> %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(<32 x float> %mai)
+  ret void
+}
+
+define void @test_rewrite_mfma_f32_16x16x2bf16(<2 x i16> %arg0, <2 x i16> 
%arg1, ptr addrspace(1) %ptr) #0 {
+; CHECK-LABEL: test_rewrite_mfma_f32_16x16x2bf16:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx4 a[12:15], v[2:3], off offset:48
+; CHECK-NEXT:global_load_dwordx4 a[8:11], v[2:3], off offset:32
+; CHECK-NEXT:global_load_dwordx4 a[4:7], v[2:3], off offset:16
+; CHECK-NEXT:global_load_dwordx4 a[0:3], v[2:3], off
+; CHECK-NEXT:s_waitcn

[llvm-branch-commits] [llvm] AMDGPU: Handle V->A MFMA copy from case with immediate src2 (PR #153023)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153023

>From 579e971aabb02ee4c7e0d5e628fbb47b86d0ed63 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 18:22:09 +0900
Subject: [PATCH] AMDGPU: Handle V->A MFMA copy from case with immediate src2

Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.
---
 llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp  | 11 ++-
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir|  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..fb3eb87240c93 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -377,13 +377,14 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
 Register CopyDstReg = UseMI.getOperand(0).getReg();
 if (!CopyDstReg.isVirtual())
   continue;
+for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+  if (!CopyUseMO.readsReg())
+continue;
 
-for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+  MachineInstr &CopyUseMI = *CopyUseMO.getParent();
   if (isRewriteCandidate(CopyUseMI)) {
-const MachineOperand *Op =
-CopyUseMI.findRegisterUseOperand(CopyDstReg, /*TRI=*/nullptr);
-if (tryReassigningMFMAChain(CopyUseMI, Op->getOperandNo(),
-VRM.getPhys(Op->getReg(
+if (tryReassigningMFMAChain(CopyUseMI, CopyUseMO.getOperandNo(),
+VRM.getPhys(CopyUseMO.getReg(
   MadeChange = true;
   }
 }
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index 1c5e0e362e359..bcd0e027b209e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
 ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
 ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = 
GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
-; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY 
[[GLOBAL_LOAD_DWORDX4_]]
-; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = 
V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit 
$mode, implicit $exec
+; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY 
[[GLOBAL_LOAD_DWORDX4_]]
+; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = 
V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, 
implicit $exec
 ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit 
$exec :: (store (s128), addrspace 1)
 ; CHECK-NEXT: SI_RETURN
 %0:vreg_64_align2 = COPY $vgpr4_vgpr5

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add test for mfma rewrite pass respecting optnone (PR #153025)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153025

>From d53062ac44d3770b2f4e3f993bfed0b26294200d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 19:05:44 +0900
Subject: [PATCH] AMDGPU: Add test for mfma rewrite pass respecting optnone

---
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr.ll   | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
index 343a5c8511ee9..6f7809f46d10a 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll
@@ -3,6 +3,40 @@
 
 target triple = "amdgcn-amd-amdhsa"
 
+define amdgpu_kernel void @respect_optnone(double %arg0, double %arg1, ptr 
addrspace(1) %ptr) #4 {
+; CHECK-LABEL: respect_optnone:
+; CHECK:   ; %bb.0: ; %bb
+; CHECK-NEXT:s_load_dwordx2 s[0:1], s[4:5], 0x0
+; CHECK-NEXT:s_load_dwordx2 s[2:3], s[4:5], 0x8
+; CHECK-NEXT:s_nop 0
+; CHECK-NEXT:s_load_dwordx2 s[4:5], s[4:5], 0x10
+; CHECK-NEXT:s_mov_b32 s6, 0x3ff
+; CHECK-NEXT:v_and_b32_e64 v0, v0, s6
+; CHECK-NEXT:s_mov_b32 s6, 3
+; CHECK-NEXT:v_lshlrev_b32_e64 v0, s6, v0
+; CHECK-NEXT:s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:global_load_dwordx2 v[0:1], v0, s[4:5]
+; CHECK-NEXT:v_mov_b64_e32 v[2:3], s[0:1]
+; CHECK-NEXT:v_mov_b64_e32 v[4:5], s[2:3]
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:s_nop 0
+; CHECK-NEXT:v_mfma_f64_4x4x4_4b_f64 v[0:1], v[2:3], v[4:5], v[0:1]
+; CHECK-NEXT:s_nop 5
+; CHECK-NEXT:v_accvgpr_write_b32 a0, v0
+; CHECK-NEXT:v_accvgpr_write_b32 a1, v1
+; CHECK-NEXT:;;#ASMSTART
+; CHECK-NEXT:; use a[0:1]
+; CHECK-NEXT:;;#ASMEND
+; CHECK-NEXT:s_endpgm
+bb:
+  %id = call i32 @llvm.amdgcn.workitem.id.x()
+  %gep = getelementptr double, ptr addrspace(1) %ptr, i32 %id
+  %src2 = load double, ptr addrspace(1) %gep
+  %mai = call double @llvm.amdgcn.mfma.f64.4x4x4f64(double %arg0, double 
%arg1, double %src2, i32 0, i32 0, i32 0)
+  call void asm sideeffect "; use $0", "a"(double %mai)
+  ret void
+}
+
 define amdgpu_kernel void @test_mfma_f32_32x32x1f32_rewrite_vgpr_mfma(ptr 
addrspace(1) %arg) #0 {
 ; CHECK-LABEL: test_mfma_f32_32x32x1f32_rewrite_vgpr_mfma:
 ; CHECK:   ; %bb.0: ; %bb
@@ -859,3 +893,4 @@ attributes #0 = { nounwind 
"amdgpu-flat-work-group-size"="1,256" "amdgpu-waves-p
 attributes #1 = { mustprogress nofree norecurse nounwind willreturn 
"amdgpu-waves-per-eu"="8,8" }
 attributes #2 = { convergent nocallback nofree nosync nounwind willreturn 
memory(none) }
 attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn 
memory(none) }
+attributes #4 = { nounwind noinline optnone }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Try to unspill VGPRs after rewriting MFMAs to AGPR form (PR #154323)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/154323

>From 5adb38cc0c45d976d54e5faddd10814b2b42 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 19 Aug 2025 13:12:37 +0900
Subject: [PATCH] AMDGPU: Try to unspill VGPRs after rewriting MFMAs to AGPR
 form

After replacing VGPR MFMAs with the AGPR form, we've alleviated VGPR
pressure which may have triggered spills during allocation. Identify
these spill slots, and try to reassign them to newly freed VGPRs,
and replace the spill instructions with copies.

Fixes #154260
---
 .../AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp  | 169 +-
 .../unspill-vgpr-after-rewrite-vgpr-mfma.ll   |  44 +
 2 files changed, 172 insertions(+), 41 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..218dc90d9935f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -28,6 +28,7 @@
 #include "SIRegisterInfo.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/LiveRegMatrix.h"
+#include "llvm/CodeGen/LiveStacks.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/InitializePasses.h"
@@ -38,6 +39,9 @@ using namespace llvm;
 
 namespace {
 
+/// Map from spill slot frame index to list of instructions which reference it.
+using SpillReferenceMap = DenseMap>;
+
 class AMDGPURewriteAGPRCopyMFMAImpl {
   MachineFunction &MF;
   const GCNSubtarget &ST;
@@ -47,6 +51,7 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
   VirtRegMap &VRM;
   LiveRegMatrix &LRM;
   LiveIntervals &LIS;
+  LiveStacks &LSS;
   const RegisterClassInfo &RegClassInfo;
 
   bool attemptReassignmentsToAGPR(SmallSetVector &InterferingRegs,
@@ -55,10 +60,11 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
 public:
   AMDGPURewriteAGPRCopyMFMAImpl(MachineFunction &MF, VirtRegMap &VRM,
 LiveRegMatrix &LRM, LiveIntervals &LIS,
+LiveStacks &LSS,
 const RegisterClassInfo &RegClassInfo)
   : MF(MF), ST(MF.getSubtarget()), TII(*ST.getInstrInfo()),
 TRI(*ST.getRegisterInfo()), MRI(MF.getRegInfo()), VRM(VRM), LRM(LRM),
-LIS(LIS), RegClassInfo(RegClassInfo) {}
+LIS(LIS), LSS(LSS), RegClassInfo(RegClassInfo) {}
 
   bool isRewriteCandidate(const MachineInstr &MI) const {
 return TII.isMAI(MI) && AMDGPU::getMFMASrcCVDstAGPROp(MI.getOpcode()) != 
-1;
@@ -106,6 +112,22 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
 
   bool tryFoldCopiesToAGPR(Register VReg, MCRegister AssignedAGPR) const;
   bool tryFoldCopiesFromAGPR(Register VReg, MCRegister AssignedAGPR) const;
+
+  /// Replace spill instruction \p SpillMI which loads/stores from/to \p 
SpillFI
+  /// with a COPY to the replacement register value \p VReg.
+  void replaceSpillWithCopyToVReg(MachineInstr &SpillMI, int SpillFI,
+  Register VReg) const;
+
+  /// Create a map from frame index to use instructions for spills. If a use of
+  /// the frame index does not consist only of spill instructions, it will not
+  /// be included in the map.
+  void collectSpillIndexUses(ArrayRef StackIntervals,
+ SpillReferenceMap &Map) const;
+
+  /// Attempt to unspill VGPRs by finding a free register and replacing the
+  /// spill instructions with copies.
+  void eliminateSpillsOfReassignedVGPRs() const;
+
   bool run(MachineFunction &MF) const;
 };
 
@@ -392,6 +414,133 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
   return MadeChange;
 }
 
+void AMDGPURewriteAGPRCopyMFMAImpl::replaceSpillWithCopyToVReg(
+MachineInstr &SpillMI, int SpillFI, Register VReg) const {
+  const DebugLoc &DL = SpillMI.getDebugLoc();
+  MachineBasicBlock &MBB = *SpillMI.getParent();
+  MachineInstr *NewCopy;
+  if (SpillMI.mayStore()) {
+NewCopy = BuildMI(MBB, SpillMI, DL, TII.get(TargetOpcode::COPY), VReg)
+  .add(SpillMI.getOperand(0));
+  } else {
+NewCopy = BuildMI(MBB, SpillMI, DL, TII.get(TargetOpcode::COPY))
+  .add(SpillMI.getOperand(0))
+  .addReg(VReg);
+  }
+
+  LIS.ReplaceMachineInstrInMaps(SpillMI, *NewCopy);
+  SpillMI.eraseFromParent();
+}
+
+void AMDGPURewriteAGPRCopyMFMAImpl::collectSpillIndexUses(
+ArrayRef StackIntervals, SpillReferenceMap &Map) const {
+
+  SmallSet NeededFrameIndexes;
+  for (const LiveInterval *LI : StackIntervals)
+NeededFrameIndexes.insert(LI->reg().stackSlotIndex());
+
+  for (MachineBasicBlock &MBB : MF) {
+for (MachineInstr &MI : MBB) {
+  for (MachineOperand &MO : MI.operands()) {
+if (!MO.isFI() || !NeededFrameIndexes.count(MO.getIndex()))
+  continue;
+
+SmallVector &References = Map[MO.getIndex()];
+if (TII.isVGPRSpill(MI)) {
+  References.push_back(&MI);
+  break;
+

[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for unspilling VGPRs after MFMA rewrite (PR #154322)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/154322

>From 883e110c8f86719a810c4d5a1930434af532194c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 19 Aug 2025 21:29:05 +0900
Subject: [PATCH] AMDGPU: Add baseline test for unspilling VGPRs after MFMA
 rewrite

Test for #154260
---
 .../unspill-vgpr-after-rewrite-vgpr-mfma.ll   | 454 ++
 1 file changed, 454 insertions(+)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll

diff --git a/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll 
b/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll
new file mode 100644
index 0..122d46b39ff32
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll
@@ -0,0 +1,454 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -amdgpu-mfma-vgpr-form < %s 
| FileCheck %s
+
+; After reassigning the MFMA to use AGPRs, we've alleviated enough
+; register pressure to try eliminating the spill of %spill with the freed
+; up VGPR.
+define void @eliminate_spill_after_mfma_rewrite(i32 %x, i32 %y, <4 x i32> 
%arg, ptr addrspace(1) inreg %ptr) #0 {
+; CHECK-LABEL: eliminate_spill_after_mfma_rewrite:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:v_accvgpr_write_b32 a3, v5
+; CHECK-NEXT:v_accvgpr_write_b32 a2, v4
+; CHECK-NEXT:v_accvgpr_write_b32 a1, v3
+; CHECK-NEXT:v_accvgpr_write_b32 a0, v2
+; CHECK-NEXT:buffer_store_dword v40, off, s[0:3], s32 offset:188 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v41, off, s[0:3], s32 offset:184 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v42, off, s[0:3], s32 offset:180 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v43, off, s[0:3], s32 offset:176 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v44, off, s[0:3], s32 offset:172 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v45, off, s[0:3], s32 offset:168 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v46, off, s[0:3], s32 offset:164 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v47, off, s[0:3], s32 offset:160 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v56, off, s[0:3], s32 offset:156 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v57, off, s[0:3], s32 offset:152 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v58, off, s[0:3], s32 offset:148 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v59, off, s[0:3], s32 offset:144 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v60, off, s[0:3], s32 offset:140 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v61, off, s[0:3], s32 offset:136 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v62, off, s[0:3], s32 offset:132 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword v63, off, s[0:3], s32 offset:128 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a32, off, s[0:3], s32 offset:124 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a33, off, s[0:3], s32 offset:120 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a34, off, s[0:3], s32 offset:116 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a35, off, s[0:3], s32 offset:112 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a36, off, s[0:3], s32 offset:108 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a37, off, s[0:3], s32 offset:104 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a38, off, s[0:3], s32 offset:100 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a39, off, s[0:3], s32 offset:96 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a40, off, s[0:3], s32 offset:92 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a41, off, s[0:3], s32 offset:88 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a42, off, s[0:3], s32 offset:84 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a43, off, s[0:3], s32 offset:80 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a44, off, s[0:3], s32 offset:76 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a45, off, s[0:3], s32 offset:72 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a46, off, s[0:3], s32 offset:68 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a47, off, s[0:3], s32 offset:64 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a48, off, s[0:3], s32 offset:60 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a49, off, s[0:3], s32 offset:56 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a50, off, s[0:3], s32 offset:52 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a51, off, s[0:3], s32 offset:48 ; 4-byte 
Folded Spill
+; CHECK-NEXT:buffer_store_dword a52, off, s[0:3], 

[llvm-branch-commits] [clang] [HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (PR #154454)

2025-08-21 Thread Chris B via llvm-branch-commits


@@ -247,6 +247,30 @@ void tools::hlsl::MetalConverter::ConstructJob(
  Exec, CmdArgs, Inputs, Input));
 }
 
+void tools::hlsl::LLVMObjcopy::ConstructJob(Compilation &C, const JobAction 
&JA,
+const InputInfo &Output,
+const InputInfoList &Inputs,
+const ArgList &Args,
+const char *LinkingOutput) const {
+
+  std::string ObjcopyPath = getToolChain().GetProgramPath("llvm-objcopy");
+  const char *Exec = Args.MakeArgString(ObjcopyPath);
+
+  ArgStringList CmdArgs;
+  assert(Inputs.size() == 1 && "Unable to handle multiple inputs.");
+  const InputInfo &Input = Inputs[0];
+  CmdArgs.push_back(Input.getFilename());
+  CmdArgs.push_back(Output.getFilename());
+
+  if (Args.hasArg(options::OPT_dxc_strip_rootsignature)) {
+const char *Frs = Args.MakeArgString("--remove-section=RTS0");
+CmdArgs.push_back(Frs);
+  }
+

llvm-beanz wrote:

Should we assert that `CmdArgs > 2` to ensure that we didn't just invoke 
objcopy for no reason?

https://github.com/llvm/llvm-project/pull/154454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle V->A MFMA copy from case with immediate src2 (PR #153023)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/153023

>From 579e971aabb02ee4c7e0d5e628fbb47b86d0ed63 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 11 Aug 2025 18:22:09 +0900
Subject: [PATCH] AMDGPU: Handle V->A MFMA copy from case with immediate src2

Handle a special case for copies from AGPR VGPR on the MFMA inputs.
If the "input" is really a subregister def, we will not see the
usual copy to VGPR for src2, only the read of the subregister def.
Not sure if this pattern appears in practice.
---
 llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp  | 11 ++-
 .../AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir|  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..fb3eb87240c93 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -377,13 +377,14 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
 Register CopyDstReg = UseMI.getOperand(0).getReg();
 if (!CopyDstReg.isVirtual())
   continue;
+for (MachineOperand &CopyUseMO : MRI.reg_nodbg_operands(CopyDstReg)) {
+  if (!CopyUseMO.readsReg())
+continue;
 
-for (MachineInstr &CopyUseMI : MRI.use_instructions(CopyDstReg)) {
+  MachineInstr &CopyUseMI = *CopyUseMO.getParent();
   if (isRewriteCandidate(CopyUseMI)) {
-const MachineOperand *Op =
-CopyUseMI.findRegisterUseOperand(CopyDstReg, /*TRI=*/nullptr);
-if (tryReassigningMFMAChain(CopyUseMI, Op->getOperandNo(),
-VRM.getPhys(Op->getReg(
+if (tryReassigningMFMAChain(CopyUseMI, CopyUseMO.getOperandNo(),
+VRM.getPhys(CopyUseMO.getReg(
   MadeChange = true;
   }
 }
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir 
b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
index 1c5e0e362e359..bcd0e027b209e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr-copy-from.mir
@@ -187,8 +187,8 @@ body: |
 ; CHECK-NEXT: [[COPY1:%[0-9]+]]:av_64_align2 = COPY $vgpr0_vgpr1
 ; CHECK-NEXT: [[COPY2:%[0-9]+]]:av_64_align2 = COPY $vgpr2_vgpr3
 ; CHECK-NEXT: [[GLOBAL_LOAD_DWORDX4_:%[0-9]+]]:areg_128_align2 = 
GLOBAL_LOAD_DWORDX4 [[COPY]], 0, 0, implicit $exec :: (load (s128), addrspace 1)
-; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128_align2 = COPY 
[[GLOBAL_LOAD_DWORDX4_]]
-; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:vreg_128_align2 = 
V_MFMA_F64_4X4X4F64_vgprcd_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit 
$mode, implicit $exec
+; CHECK-NEXT: [[COPY3:%[0-9]+]]:areg_128_align2 = COPY 
[[GLOBAL_LOAD_DWORDX4_]]
+; CHECK-NEXT: [[COPY3:%[0-9]+]].sub0_sub1:areg_128_align2 = 
V_MFMA_F64_4X4X4F64_e64 [[COPY1]], [[COPY2]], 0, 0, 0, 0, implicit $mode, 
implicit $exec
 ; CHECK-NEXT: GLOBAL_STORE_DWORDX4 [[COPY]], [[COPY3]], 0, 0, implicit 
$exec :: (store (s128), addrspace 1)
 ; CHECK-NEXT: SI_RETURN
 %0:vreg_64_align2 = COPY $vgpr4_vgpr5

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (PR #154454)

2025-08-21 Thread Chris B via llvm-branch-commits


@@ -4601,6 +4601,16 @@ void Driver::BuildActions(Compilation &C, DerivedArgList 
&Args,
 Actions.push_back(C.MakeAction(
 LastAction, types::TY_DX_CONTAINER));
 }
+if (TC.requiresObjcopy(Args)) {
+  Action *LastAction = Actions.back();
+  // llvm-objcopy expects a DXIL container, which can either be
+  // validated (in which case they are TY_DX_CONTAINER), or unvalidated
+  // (TY_OBJECT).
+  if (LastAction->getType() == types::TY_DX_CONTAINER ||

llvm-beanz wrote:

Should we be running this before validation rather than after? What does DXC do?

https://github.com/llvm/llvm-project/pull/154454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Try to unspill VGPRs after rewriting MFMAs to AGPR form (PR #154323)

2025-08-21 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/154323

>From 5adb38cc0c45d976d54e5faddd10814b2b42 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 19 Aug 2025 13:12:37 +0900
Subject: [PATCH] AMDGPU: Try to unspill VGPRs after rewriting MFMAs to AGPR
 form

After replacing VGPR MFMAs with the AGPR form, we've alleviated VGPR
pressure which may have triggered spills during allocation. Identify
these spill slots, and try to reassign them to newly freed VGPRs,
and replace the spill instructions with copies.

Fixes #154260
---
 .../AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp  | 169 +-
 .../unspill-vgpr-after-rewrite-vgpr-mfma.ll   |  44 +
 2 files changed, 172 insertions(+), 41 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
index 639796c6cefff..218dc90d9935f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURewriteAGPRCopyMFMA.cpp
@@ -28,6 +28,7 @@
 #include "SIRegisterInfo.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/LiveRegMatrix.h"
+#include "llvm/CodeGen/LiveStacks.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/InitializePasses.h"
@@ -38,6 +39,9 @@ using namespace llvm;
 
 namespace {
 
+/// Map from spill slot frame index to list of instructions which reference it.
+using SpillReferenceMap = DenseMap>;
+
 class AMDGPURewriteAGPRCopyMFMAImpl {
   MachineFunction &MF;
   const GCNSubtarget &ST;
@@ -47,6 +51,7 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
   VirtRegMap &VRM;
   LiveRegMatrix &LRM;
   LiveIntervals &LIS;
+  LiveStacks &LSS;
   const RegisterClassInfo &RegClassInfo;
 
   bool attemptReassignmentsToAGPR(SmallSetVector &InterferingRegs,
@@ -55,10 +60,11 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
 public:
   AMDGPURewriteAGPRCopyMFMAImpl(MachineFunction &MF, VirtRegMap &VRM,
 LiveRegMatrix &LRM, LiveIntervals &LIS,
+LiveStacks &LSS,
 const RegisterClassInfo &RegClassInfo)
   : MF(MF), ST(MF.getSubtarget()), TII(*ST.getInstrInfo()),
 TRI(*ST.getRegisterInfo()), MRI(MF.getRegInfo()), VRM(VRM), LRM(LRM),
-LIS(LIS), RegClassInfo(RegClassInfo) {}
+LIS(LIS), LSS(LSS), RegClassInfo(RegClassInfo) {}
 
   bool isRewriteCandidate(const MachineInstr &MI) const {
 return TII.isMAI(MI) && AMDGPU::getMFMASrcCVDstAGPROp(MI.getOpcode()) != 
-1;
@@ -106,6 +112,22 @@ class AMDGPURewriteAGPRCopyMFMAImpl {
 
   bool tryFoldCopiesToAGPR(Register VReg, MCRegister AssignedAGPR) const;
   bool tryFoldCopiesFromAGPR(Register VReg, MCRegister AssignedAGPR) const;
+
+  /// Replace spill instruction \p SpillMI which loads/stores from/to \p 
SpillFI
+  /// with a COPY to the replacement register value \p VReg.
+  void replaceSpillWithCopyToVReg(MachineInstr &SpillMI, int SpillFI,
+  Register VReg) const;
+
+  /// Create a map from frame index to use instructions for spills. If a use of
+  /// the frame index does not consist only of spill instructions, it will not
+  /// be included in the map.
+  void collectSpillIndexUses(ArrayRef StackIntervals,
+ SpillReferenceMap &Map) const;
+
+  /// Attempt to unspill VGPRs by finding a free register and replacing the
+  /// spill instructions with copies.
+  void eliminateSpillsOfReassignedVGPRs() const;
+
   bool run(MachineFunction &MF) const;
 };
 
@@ -392,6 +414,133 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
   return MadeChange;
 }
 
+void AMDGPURewriteAGPRCopyMFMAImpl::replaceSpillWithCopyToVReg(
+MachineInstr &SpillMI, int SpillFI, Register VReg) const {
+  const DebugLoc &DL = SpillMI.getDebugLoc();
+  MachineBasicBlock &MBB = *SpillMI.getParent();
+  MachineInstr *NewCopy;
+  if (SpillMI.mayStore()) {
+NewCopy = BuildMI(MBB, SpillMI, DL, TII.get(TargetOpcode::COPY), VReg)
+  .add(SpillMI.getOperand(0));
+  } else {
+NewCopy = BuildMI(MBB, SpillMI, DL, TII.get(TargetOpcode::COPY))
+  .add(SpillMI.getOperand(0))
+  .addReg(VReg);
+  }
+
+  LIS.ReplaceMachineInstrInMaps(SpillMI, *NewCopy);
+  SpillMI.eraseFromParent();
+}
+
+void AMDGPURewriteAGPRCopyMFMAImpl::collectSpillIndexUses(
+ArrayRef StackIntervals, SpillReferenceMap &Map) const {
+
+  SmallSet NeededFrameIndexes;
+  for (const LiveInterval *LI : StackIntervals)
+NeededFrameIndexes.insert(LI->reg().stackSlotIndex());
+
+  for (MachineBasicBlock &MBB : MF) {
+for (MachineInstr &MI : MBB) {
+  for (MachineOperand &MO : MI.operands()) {
+if (!MO.isFI() || !NeededFrameIndexes.count(MO.getIndex()))
+  continue;
+
+SmallVector &References = Map[MO.getIndex()];
+if (TII.isVGPRSpill(MI)) {
+  References.push_back(&MI);
+  break;
+

[llvm-branch-commits] [clang] [HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (PR #154454)

2025-08-21 Thread Chris B via llvm-branch-commits


@@ -42,6 +42,19 @@ class LLVM_LIBRARY_VISIBILITY MetalConverter : public Tool {
 const llvm::opt::ArgList &TCArgs,
 const char *LinkingOutput) const override;
 };
+
+class LLVM_LIBRARY_VISIBILITY LLVMObjcopy : public Tool {
+public:
+  LLVMObjcopy(const ToolChain &TC)
+  : Tool("hlsl::LLVMObjcopy", "llvm-objcopy", TC) {}

llvm-beanz wrote:

I don't think the `hlsl` is relevant here. It's really just objcopy.
```suggestion
  : Tool("LLVMObjcopy", "llvm-objcopy", TC) {}
```

https://github.com/llvm/llvm-project/pull/154454
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SimplifyCFG] Set branch weights when merging conditional store to address (PR #154841)

2025-08-21 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- 
llvm/include/llvm/IR/ProfDataUtils.h llvm/lib/Transforms/Utils/SimplifyCFG.cpp
``





View the diff from clang-format here.


``diff
diff --git a/llvm/include/llvm/IR/ProfDataUtils.h 
b/llvm/include/llvm/IR/ProfDataUtils.h
index c9284c1bc..c5b792689 100644
--- a/llvm/include/llvm/IR/ProfDataUtils.h
+++ b/llvm/include/llvm/IR/ProfDataUtils.h
@@ -195,7 +195,7 @@ getDisjunctionWeights(const SmallVector &B1,
   const SmallVector &B2) {
   // the probability of the new branch being taken is:
   // P = p(b1) + p(b2) - p (b1 and b2)
-  // not P = p((not b1) and (not b2)) = 
+  // not P = p((not b1) and (not b2)) =
   //   = B1[1] / (B1[0]+B1[1]) * B2[1] / (B2[0]+B2[1]) =
   //   = B1[1] * B2[1] / (B1[0] + B1[1]) * (B2[0] + B2[1])
   // P = 1 - (not P)

``




https://github.com/llvm/llvm-project/pull/154841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][OpenMP] move omp end sections validation to semantics (PR #154740)

2025-08-21 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah updated 
https://github.com/llvm/llvm-project/pull/154740

>From f84070b2b0b1f446dded8a1c93267034169e39fb Mon Sep 17 00:00:00 2001
From: Tom Eccles 
Date: Wed, 20 Aug 2025 17:19:24 +
Subject: [PATCH 1/2] [flang][OpenMP] move omp end sections validation to
 semantics

See #90452. The old parse tree errors exploded to thousands of unhelpful
lines when there were multiple missing end directives.

Instead, allow a missing end directive in the parse tree then validate
that it is present during semantics (where the error messages are a lot
easier to control).
---
 flang/include/flang/Parser/parse-tree.h |  5 -
 flang/lib/Lower/OpenMP/OpenMP.cpp   |  7 +--
 flang/lib/Parser/openmp-parsers.cpp |  2 +-
 flang/lib/Parser/unparse.cpp|  2 +-
 flang/lib/Semantics/check-omp-structure.cpp | 17 +
 .../Semantics/OpenMP/missing-end-directive.f90  |  4 
 6 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index 1d1a4a163084b..4189a340d32cd 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -4893,8 +4893,11 @@ struct OpenMPSectionsConstruct {
   CharBlock source;
   // Each of the OpenMPConstructs in the list below contains an
   // OpenMPSectionConstruct. This is guaranteed by the parser.
+  // The end sections directive is optional here because it is difficult to
+  // generate helpful error messages for a missing end directive wihtin the
+  // parser. Semantics will generate an error if this is absent.
   std::tuple,
-  OmpEndSectionsDirective>
+  std::optional>
   t;
 };
 
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index ec2ec37e623f8..da5898480da22 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -3958,9 +3958,12 @@ static void genOMP(lower::AbstractConverter &converter, 
lower::SymMap &symTable,
   List clauses = makeClauses(
   std::get(beginSectionsDirective.t), semaCtx);
   const auto &endSectionsDirective =
-  std::get(sectionsConstruct.t);
+  std::get>(
+  sectionsConstruct.t);
+  assert(endSectionsDirective &&
+ "Missing end section directive should have been handled in 
semantics");
   clauses.append(makeClauses(
-  std::get(endSectionsDirective.t), semaCtx));
+  std::get(endSectionsDirective->t), semaCtx));
   mlir::Location currentLocation = converter.getCurrentLocation();
 
   llvm::omp::Directive directive =
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 2093aca38c9d6..62630642edb53 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1917,7 +1917,7 @@ TYPE_PARSER(sourced(construct(
 construct(maybe(sectionDir), block))),
 many(construct(
 sourced(construct(sectionDir, block),
-Parser{} / endOmpLine)))
+maybe(Parser{} / endOmpLine
 
 static bool IsExecutionPart(const OmpDirectiveName &name) {
   return name.IsExecutionPart();
diff --git a/flang/lib/Parser/unparse.cpp b/flang/lib/Parser/unparse.cpp
index 09dcfe60a46bc..87e699dbc4e8d 100644
--- a/flang/lib/Parser/unparse.cpp
+++ b/flang/lib/Parser/unparse.cpp
@@ -2788,7 +2788,7 @@ class UnparseVisitor {
 Walk(std::get>(x.t), "");
 BeginOpenMP();
 Word("!$OMP END ");
-Walk(std::get(x.t));
+Walk(std::get>(x.t));
 Put("\n");
 EndOpenMP();
   }
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 835802d81894e..3533cd70f7ad9 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -1063,14 +1063,23 @@ void OmpStructureChecker::Leave(const 
parser::OmpBeginDirective &) {
 void OmpStructureChecker::Enter(const parser::OpenMPSectionsConstruct &x) {
   const auto &beginSectionsDir{
   std::get(x.t)};
-  const auto &endSectionsDir{std::get(x.t)};
+  const auto &endSectionsDir{
+  std::get>(x.t)};
   const auto &beginDir{
   std::get(beginSectionsDir.t)};
-  const auto &endDir{std::get(endSectionsDir.t)};
+  PushContextAndClauseSets(beginDir.source, beginDir.v);
+
+  if (!endSectionsDir) {
+context_.Say(beginSectionsDir.source,
+"Expected OpenMP END SECTIONS directive"_err_en_US);
+// Following code assumes the option is present.
+return;
+  }
+
+  const auto 
&endDir{std::get(endSectionsDir->t)};
   CheckMatching(beginDir, endDir);
 
-  PushContextAndClauseSets(beginDir.source, beginDir.v);
-  AddEndDirectiveClauses(std::get(endSectionsDir.t));
+  AddEndDirectiveClauses(std::get(endSectionsDir->t));
 
   const auto §ionBlocks{std::get>(x.t)};
   for (const parser::OpenMPConstruct &construct : sectionBlocks) {
diff --git a/flang/test/Semantics/OpenMP/missing-end-directive.f90 
b/flang/te

[llvm-branch-commits] [llvm] [SimplifyCFG] Set branch weights when merging conditional store to address (PR #154841)

2025-08-21 Thread Mircea Trofin via llvm-branch-commits

mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/154841?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#154841** https://app.graphite.dev/github/pr/llvm/llvm-project/154841?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/154841?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#154635** https://app.graphite.dev/github/pr/llvm/llvm-project/154635?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#154426** https://app.graphite.dev/github/pr/llvm/llvm-project/154426?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/154841
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SimplifyCFG] Set branch weights when merging conditional store to address (PR #154841)

2025-08-21 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/154841

None

>From f4441cbe5e38f6abc76604a8049f6e36fb4881a7 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 21 Aug 2025 13:54:49 -0700
Subject: [PATCH] [SimplifyCFG] Set branch weights when merging conditional
 store to address

---
 llvm/include/llvm/IR/ProfDataUtils.h  | 22 +
 llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 39 +++
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/llvm/include/llvm/IR/ProfDataUtils.h 
b/llvm/include/llvm/IR/ProfDataUtils.h
index 404875285beae..c9284c1bc8dde 100644
--- a/llvm/include/llvm/IR/ProfDataUtils.h
+++ b/llvm/include/llvm/IR/ProfDataUtils.h
@@ -15,6 +15,7 @@
 #ifndef LLVM_IR_PROFDATAUTILS_H
 #define LLVM_IR_PROFDATAUTILS_H
 
+#include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/IR/Metadata.h"
@@ -186,5 +187,26 @@ LLVM_ABI bool hasExplicitlyUnknownBranchWeights(const 
Instruction &I);
 /// Scaling the profile data attached to 'I' using the ratio of S/T.
 LLVM_ABI void scaleProfData(Instruction &I, uint64_t S, uint64_t T);
 
+/// get the branch weights of a branch conditioned on b1 || b2, where b1 and b2
+/// are 2 booleans that are the condition of 2 branches for which we have the
+/// branch weights B1 and B2, respectivelly.
+inline SmallVector
+getDisjunctionWeights(const SmallVector &B1,
+  const SmallVector &B2) {
+  // the probability of the new branch being taken is:
+  // P = p(b1) + p(b2) - p (b1 and b2)
+  // not P = p((not b1) and (not b2)) = 
+  //   = B1[1] / (B1[0]+B1[1]) * B2[1] / (B2[0]+B2[1]) =
+  //   = B1[1] * B2[1] / (B1[0] + B1[1]) * (B2[0] + B2[1])
+  // P = 1 - (not P)
+  // The numerator of P will be (B1[0] + B1[1]) * (B2[0] + B2[1]) - B1[1]*B2[1]
+  // ... which becomes what's shown below.
+  // We don't need the denominators, they are the same
+  assert(B1.size() == 2);
+  assert(B2.size() == 2);
+  auto FalseWeight = B1[1] * B2[1];
+  auto TrueWeight = B1[0] * B2[0] + B1[0] * B2[1] + B1[1] * B2[0];
+  return {TrueWeight, FalseWeight};
+}
 } // namespace llvm
 #endif
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp 
b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 4847add386dc4..e26a189564d13 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -1182,7 +1182,7 @@ static void 
cloneInstructionsIntoPredecessorBlockAndUpdateSSAUses(
 // only given the branch precondition.
 // Similarly strip attributes on call parameters that may cause UB in
 // location the call is moved to.
-NewBonusInst->dropUBImplyingAttrsAndMetadata();
+NewBonusInst->dropUBImplyingAttrsAndMetadata({LLVMContext::MD_prof});
 
 NewBonusInst->insertInto(PredBlock, PTI->getIterator());
 auto Range = NewBonusInst->cloneDebugInfoFrom(&BonusInst);
@@ -1808,7 +1808,8 @@ static void hoistConditionalLoadsStores(
 // !annotation: Not impact semantics. Keep it.
 if (const MDNode *Ranges = I->getMetadata(LLVMContext::MD_range))
   MaskedLoadStore->addRangeRetAttr(getConstantRangeFromMetadata(*Ranges));
-I->dropUBImplyingAttrsAndUnknownMetadata({LLVMContext::MD_annotation});
+I->dropUBImplyingAttrsAndUnknownMetadata(
+{LLVMContext::MD_annotation, LLVMContext::MD_prof});
 // FIXME: DIAssignID is not supported for masked store yet.
 // (Verifier::visitDIAssignIDMetadata)
 at::deleteAssignmentMarkers(I);
@@ -3366,7 +3367,7 @@ bool SimplifyCFGOpt::speculativelyExecuteBB(BranchInst 
*BI,
 if (!SpeculatedStoreValue || &I != SpeculatedStore) {
   I.setDebugLoc(DebugLoc::getDropped());
 }
-I.dropUBImplyingAttrsAndMetadata();
+I.dropUBImplyingAttrsAndMetadata({LLVMContext::MD_prof});
 
 // Drop ephemeral values.
 if (EphTracker.contains(&I)) {
@@ -4404,10 +4405,12 @@ static bool mergeConditionalStoreToAddress(
 
   // OK, we're going to sink the stores to PostBB. The store has to be
   // conditional though, so first create the predicate.
-  Value *PCond = cast(PFB->getSinglePredecessor()->getTerminator())
- ->getCondition();
-  Value *QCond = cast(QFB->getSinglePredecessor()->getTerminator())
- ->getCondition();
+  BranchInst *const PBranch =
+  cast(PFB->getSinglePredecessor()->getTerminator());
+  BranchInst *const QBranch =
+  cast(QFB->getSinglePredecessor()->getTerminator());
+  Value *const PCond = PBranch->getCondition();
+  Value *const QCond = QBranch->getCondition();
 
   Value *PPHI = ensureValueAvailableInSuccessor(PStore->getValueOperand(),
 PStore->getParent());
@@ -4418,19 +4421,29 @@ static bool mergeConditionalStoreToAddress(
   IRBuilder<> QB(PostBB, PostBBFirst);
   QB.SetCurrentDebugLocation(PostBBFirst->getStableDebugLoc());
 
-  Value *PPred = PStore->getParent() == PTB ? PCond : QB.CreateNot(PCond);
-  Va

[llvm-branch-commits] [libc] [libc][math][c++23] Add {get, set}payloadbf16 and setpayloadsigbf16 math functions (PR #153994)

2025-08-21 Thread via llvm-branch-commits

https://github.com/overmighty approved this pull request.


https://github.com/llvm/llvm-project/pull/153994
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [libc][math][c++23] Add {get, set}payloadbf16 and setpayloadsigbf16 math functions (PR #153994)

2025-08-21 Thread via llvm-branch-commits


@@ -51,23 +51,50 @@ class GetPayloadTestTemplate : public 
LIBC_NAMESPACE::testing::FEnvSafeTest {
 EXPECT_FP_EQ(default_snan_payload, funcWrapper(func, sNaN));
 EXPECT_FP_EQ(default_snan_payload, funcWrapper(func, neg_sNaN));
 
-T qnan_42 = FPBits::quiet_nan(Sign::POS, 0x42).get_val();
-T neg_qnan_42 = FPBits::quiet_nan(Sign::NEG, 0x42).get_val();
-T snan_42 = FPBits::signaling_nan(Sign::POS, 0x42).get_val();
-T neg_snan_42 = FPBits::signaling_nan(Sign::NEG, 0x42).get_val();
-EXPECT_FP_EQ(T(0x42.0p+0), funcWrapper(func, qnan_42));
-EXPECT_FP_EQ(T(0x42.0p+0), funcWrapper(func, neg_qnan_42));
-EXPECT_FP_EQ(T(0x42.0p+0), funcWrapper(func, snan_42));
-EXPECT_FP_EQ(T(0x42.0p+0), funcWrapper(func, neg_snan_42));
-
-T qnan_123 = FPBits::quiet_nan(Sign::POS, 0x123).get_val();
-T neg_qnan_123 = FPBits::quiet_nan(Sign::NEG, 0x123).get_val();
-T snan_123 = FPBits::signaling_nan(Sign::POS, 0x123).get_val();
-T neg_snan_123 = FPBits::signaling_nan(Sign::NEG, 0x123).get_val();
-EXPECT_FP_EQ(T(0x123.0p+0), funcWrapper(func, qnan_123));
-EXPECT_FP_EQ(T(0x123.0p+0), funcWrapper(func, neg_qnan_123));
-EXPECT_FP_EQ(T(0x123.0p+0), funcWrapper(func, snan_123));
-EXPECT_FP_EQ(T(0x123.0p+0), funcWrapper(func, neg_snan_123));
+if constexpr (FPBits::FRACTION_LEN - 1 >= 6) {

overmighty wrote:

Nit: it's a bit weird that you guard both 6-bit and 5-bit payload test cases 
with a single `FRACTION_LEN - 1 >= 6` check but you guard 7-bit and 9-bit 
payload test cases with separate `FRACTION_LEN - 1 >= 7` and `>= 9` checks.

https://github.com/llvm/llvm-project/pull/153994
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang] Align cleanup structs to prevent SIGBUS on sparc32 (#152866) (PR #154002)

2025-08-21 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic edited 
https://github.com/llvm/llvm-project/pull/154002
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [clang] Align cleanup structs to prevent SIGBUS on sparc32 (#152866) (PR #154002)

2025-08-21 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic approved this pull request.

LGTM

On a side-note, is it a known issue that the "CI Checks / MacOS Premerge Checks 
(pull_request)" check is broken?  It says it's passing, but the log says it 
isn't actually building anything.

https://github.com/llvm/llvm-project/pull/154002
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Add `split-section` to `llvm-objcopy` and implement it for `DXContainer` (PR #153265)

2025-08-21 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic closed 
https://github.com/llvm/llvm-project/pull/153265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Add `split-section` to `llvm-objcopy` and implement it for `DXContainer` (PR #153265)

2025-08-21 Thread Finn Plummer via llvm-branch-commits

inbelic wrote:

Closing this PR in favour of https://github.com/llvm/llvm-project/pull/154804.

https://github.com/llvm/llvm-project/pull/153265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit memory legalizer tests for private AS (PR #154709)

2025-08-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec commented:

Basically these tests do not render in the github web and I do not understand 
which branch is it targeted to. I guess you can commit changes to your private 
branch w/o review.

https://github.com/llvm/llvm-project/pull/154709
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Validating Root flags are denying shader stage (PR #153287)

2025-08-21 Thread Finn Plummer via llvm-branch-commits


@@ -205,7 +205,57 @@ getRootDescriptorsBindingInfo(const 
mcdxbc::RootSignatureDesc &RSD,
   return RDs;
 }
 
-static void validateRootSignatureBindings(Module &M,
+static void reportIfDeniedShaderStageAccess(Module &M, dxbc::RootFlags Flags,
+dxbc::RootFlags Mask) {
+  if ((Flags & Mask) == Mask) {
+SmallString<128> Message;
+raw_svector_ostream OS(Message);
+OS << "Shader has root bindings but root signature uses a DENY flag to "
+  "disallow root binding access to the shader stage.";
+M.getContext().diagnose(DiagnosticInfoGeneric(Message));
+  }
+}
+
+static void validateDeniedStagedNotInUse(Module &M,
+ const mcdxbc::RootSignatureDesc &RSD,
+ const dxil::ModuleMetadataInfo &MMI) {
+  dxbc::RootFlags Flags = dxbc::RootFlags(RSD.Flags);
+

inbelic wrote:

Can we add a validation to `RootsignatureValidations` like 
`validateShaderStage(Flags, ShaderProfile)` so that it can be re-used in the 
frontend?

https://github.com/llvm/llvm-project/pull/153287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] MLIR bug fixes for LLVM 21.x release (PR #154587)

2025-08-21 Thread Mehdi Amini via llvm-branch-commits

https://github.com/joker-eph milestoned 
https://github.com/llvm/llvm-project/pull/154587
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of bitcode file. (PR #154859)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: None (llvmbot)


Changes

Backport ff85dbdf6b399eac7bffa13e579f0f5e6edac3c0

Requested by: @efriedma-quic

---
Full diff: https://github.com/llvm/llvm-project/pull/154859.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp (+11-1) 
- (added) llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll (+19) 


``diff
diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp 
b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
index e276376f21583..0d631734ca968 100644
--- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
+++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
@@ -350,12 +350,20 @@ void splitAndWriteThinLTOBitcode(
   });
 }
 
+  auto MustEmitToMergedModule = [](const GlobalValue *GV) {
+// The __cfi_check definition is filled in by the CrossDSOCFI pass which
+// runs only in the merged module.
+return GV->getName() == "__cfi_check";
+  };
+
   ValueToValueMapTy VMap;
   std::unique_ptr MergedM(
   CloneModule(M, VMap, [&](const GlobalValue *GV) -> bool {
 if (const auto *C = GV->getComdat())
   if (MergedMComdats.count(C))
 return true;
+if (MustEmitToMergedModule(GV))
+  return true;
 if (auto *F = dyn_cast(GV))
   return EligibleVirtualFns.count(F);
 if (auto *GVar =
@@ -372,7 +380,7 @@ void splitAndWriteThinLTOBitcode(
   cloneUsedGlobalVariables(M, *MergedM, /*CompilerUsed*/ true);
 
   for (Function &F : *MergedM)
-if (!F.isDeclaration()) {
+if (!F.isDeclaration() && !MustEmitToMergedModule(&F)) {
   // Reset the linkage of all functions eligible for virtual constant
   // propagation. The canonical definitions live in the thin LTO module so
   // that they can be imported.
@@ -394,6 +402,8 @@ void splitAndWriteThinLTOBitcode(
 if (const auto *C = GV->getComdat())
   if (MergedMComdats.count(C))
 return false;
+if (MustEmitToMergedModule(GV))
+  return false;
 return true;
   });
 
diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll 
b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll
new file mode 100644
index 0..b927af6b92f7c
--- /dev/null
+++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll
@@ -0,0 +1,19 @@
+; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s
+; RUN: llvm-modextract -b -n 0 -o - %t | llvm-dis | FileCheck 
--check-prefix=M0 %s
+; RUN: llvm-modextract -b -n 1 -o - %t | llvm-dis | FileCheck 
--check-prefix=M1 %s
+
+; Check that __cfi_check is emitted on the full LTO side with
+; attributes preserved.
+
+; M0: define void @f()
+define void @f() !type !{!"f1", i32 0} {
+  ret void 
+}
+
+; M1: define void @__cfi_check() #0
+define void @__cfi_check() #0 {
+  ret void
+}
+
+; M1: attributes #0 = { "branch-target-enforcement" }
+attributes #0 = { "branch-target-enforcement" }

``




https://github.com/llvm/llvm-project/pull/154859
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of bitcode file. (PR #154859)

2025-08-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/154859

Backport ff85dbdf6b399eac7bffa13e579f0f5e6edac3c0

Requested by: @efriedma-quic

>From e887b28b9705a9a2b3c9af30ea0e5aba01f3b598 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Thu, 21 Aug 2025 16:31:32 -0700
Subject: [PATCH] ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of
 bitcode file.

The CrossDSOCFI pass runs on the full LTO module and fills in the
body of __cfi_check. This function must have the correct attributes in
order to be compatible with the rest of the program. For example, when
building with -mbranch-protection=standard, the function must have the
branch-target-enforcement attribute, which is normally added by Clang.
When __cfi_check is missing, CrossDSOCFI will give it the default set
of attributes, which are likely incorrect. Therefore, emit __cfi_check
to the full LTO part, where CrossDSOCFI will see it.

Reviewers: efriedma-quic, vitalybuka, fmayer

Reviewed By: efriedma-quic

Pull Request: https://github.com/llvm/llvm-project/pull/154833

(cherry picked from commit ff85dbdf6b399eac7bffa13e579f0f5e6edac3c0)
---
 .../Transforms/IPO/ThinLTOBitcodeWriter.cpp   | 12 +++-
 .../ThinLTOBitcodeWriter/cfi-check.ll | 19 +++
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll

diff --git a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp 
b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
index e276376f21583..0d631734ca968 100644
--- a/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
+++ b/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp
@@ -350,12 +350,20 @@ void splitAndWriteThinLTOBitcode(
   });
 }
 
+  auto MustEmitToMergedModule = [](const GlobalValue *GV) {
+// The __cfi_check definition is filled in by the CrossDSOCFI pass which
+// runs only in the merged module.
+return GV->getName() == "__cfi_check";
+  };
+
   ValueToValueMapTy VMap;
   std::unique_ptr MergedM(
   CloneModule(M, VMap, [&](const GlobalValue *GV) -> bool {
 if (const auto *C = GV->getComdat())
   if (MergedMComdats.count(C))
 return true;
+if (MustEmitToMergedModule(GV))
+  return true;
 if (auto *F = dyn_cast(GV))
   return EligibleVirtualFns.count(F);
 if (auto *GVar =
@@ -372,7 +380,7 @@ void splitAndWriteThinLTOBitcode(
   cloneUsedGlobalVariables(M, *MergedM, /*CompilerUsed*/ true);
 
   for (Function &F : *MergedM)
-if (!F.isDeclaration()) {
+if (!F.isDeclaration() && !MustEmitToMergedModule(&F)) {
   // Reset the linkage of all functions eligible for virtual constant
   // propagation. The canonical definitions live in the thin LTO module so
   // that they can be imported.
@@ -394,6 +402,8 @@ void splitAndWriteThinLTOBitcode(
 if (const auto *C = GV->getComdat())
   if (MergedMComdats.count(C))
 return false;
+if (MustEmitToMergedModule(GV))
+  return false;
 return true;
   });
 
diff --git a/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll 
b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll
new file mode 100644
index 0..b927af6b92f7c
--- /dev/null
+++ b/llvm/test/Transforms/ThinLTOBitcodeWriter/cfi-check.ll
@@ -0,0 +1,19 @@
+; RUN: opt -thinlto-bc -thinlto-split-lto-unit -o %t %s
+; RUN: llvm-modextract -b -n 0 -o - %t | llvm-dis | FileCheck 
--check-prefix=M0 %s
+; RUN: llvm-modextract -b -n 1 -o - %t | llvm-dis | FileCheck 
--check-prefix=M1 %s
+
+; Check that __cfi_check is emitted on the full LTO side with
+; attributes preserved.
+
+; M0: define void @f()
+define void @f() !type !{!"f1", i32 0} {
+  ret void 
+}
+
+; M1: define void @__cfi_check() #0
+define void @__cfi_check() #0 {
+  ret void
+}
+
+; M1: attributes #0 = { "branch-target-enforcement" }
+attributes #0 = { "branch-target-enforcement" }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Removing dxbc RootSignature and RootDescriptor from mcbxdc (PR #154585)

2025-08-21 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic edited 
https://github.com/llvm/llvm-project/pull/154585
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of bitcode file. (PR #154859)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:

@efriedma-quic What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/154859
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: ThinLTOBitcodeWriter: Emit __cfi_check to full LTO part of bitcode file. (PR #154859)

2025-08-21 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/154859
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DirectX] Removing dxbc RootSignature and RootDescriptor from mcbxdc (PR #154585)

2025-08-21 Thread Finn Plummer via llvm-branch-commits

https://github.com/inbelic commented:

LGTM, but the same not about a dependency: 
https://github.com/llvm/llvm-project/pull/154249#discussion_r2292364661

https://github.com/llvm/llvm-project/pull/154585
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] ba25381 - Revert "Fix Debug Build Using GCC 15 (#152223)"

2025-08-21 Thread via llvm-branch-commits

Author: dpalermo
Date: 2025-08-21T21:53:01-05:00
New Revision: ba25381d5aca817e12b2e222d6c13fc4f49cbcc1

URL: 
https://github.com/llvm/llvm-project/commit/ba25381d5aca817e12b2e222d6c13fc4f49cbcc1
DIFF: 
https://github.com/llvm/llvm-project/commit/ba25381d5aca817e12b2e222d6c13fc4f49cbcc1.diff

LOG: Revert "Fix Debug Build Using GCC 15 (#152223)"

This reverts commit 304373fb6d03531e62cf7cb1321705259a951fc1.

Added: 


Modified: 
flang-rt/lib/runtime/CMakeLists.txt
flang/lib/Optimizer/Builder/CMakeLists.txt
flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
openmp/runtime/src/CMakeLists.txt

Removed: 




diff  --git a/flang-rt/lib/runtime/CMakeLists.txt 
b/flang-rt/lib/runtime/CMakeLists.txt
index 08db8b2e3a4db..dc2db1d9902cb 100644
--- a/flang-rt/lib/runtime/CMakeLists.txt
+++ b/flang-rt/lib/runtime/CMakeLists.txt
@@ -183,10 +183,6 @@ endif ()
 
 
 if (NOT WIN32)
-  add_definitions(-U_GLIBCXX_ASSERTIONS -D_GLIBCXX_NO_ASSERTIONS)
-  add_compile_options($<$:-fno-exceptions>)
-  add_compile_options($<$:-O2>)
-
   add_flangrt_library(flang_rt.runtime STATIC SHARED
 ${sources}
 LINK_LIBRARIES ${Backtrace_LIBRARY}

diff  --git a/flang/lib/Optimizer/Builder/CMakeLists.txt 
b/flang/lib/Optimizer/Builder/CMakeLists.txt
index 404afd185fd31..8fb36a750d433 100644
--- a/flang/lib/Optimizer/Builder/CMakeLists.txt
+++ b/flang/lib/Optimizer/Builder/CMakeLists.txt
@@ -50,7 +50,6 @@ add_flang_library(FIRBuilder
   FIRDialectSupport
   FIRSupport
   FortranEvaluate
-  FortranSupport
   HLFIRDialect
 
   MLIR_DEPS

diff  --git a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt 
b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
index 3775a13e31e95..cc74273d9c5d9 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/HLFIR/Transforms/CMakeLists.txt
@@ -27,8 +27,6 @@ add_flang_library(HLFIRTransforms
   FIRSupport
   FIRTransforms
   FlangOpenMPTransforms
-  FortranEvaluate
-  FortranSupport
   HLFIRDialect
 
   LINK_COMPONENTS

diff  --git a/openmp/runtime/src/CMakeLists.txt 
b/openmp/runtime/src/CMakeLists.txt
index 71eab0eedccef..08e1753b93636 100644
--- a/openmp/runtime/src/CMakeLists.txt
+++ b/openmp/runtime/src/CMakeLists.txt
@@ -168,7 +168,7 @@ endif()
 # Disable libstdc++ assertions, even in an LLVM_ENABLE_ASSERTIONS build, to
 # avoid an unwanted dependency on libstdc++.so.
 if(NOT WIN32)
-  add_definitions(-U_GLIBCXX_ASSERTIONS -D_GLIBCXX_NO_ASSERTIONS)
+  add_definitions(-U_GLIBCXX_ASSERTIONS)
 endif()
 
 # Add the OpenMP library



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][openmp] Add Lowering to omp mlir for workdistribute construct (PR #154378)

2025-08-21 Thread via llvm-branch-commits


@@ -0,0 +1,30 @@
+! RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s

skc7 wrote:

As per spec, "The binding region is the innermost enclosing `teams` region" for 
`workdistribute`. 
Have added Semantics check in #154377 to error if `workdistribute` is not 
nested under `teams` region.

https://github.com/llvm/llvm-project/pull/154378
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Expand scratch atomics to flat atomics if GAS is enabled (PR #154710)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Pierre van Houtryve (Pierre-vh)


Changes



---

Patch is 1.02 MiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154710.diff


9 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+46-4) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+2) 
- (modified) llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll (-12) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll 
(+3235-504) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll 
(+2892-540) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll 
(+3131-475) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll 
(+2892-540) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll 
(+2938-540) 
- (added) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll 
(+172) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 561019bb65549..60faf211df0d9 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -17808,11 +17808,19 @@ static bool flatInstrMayAccessPrivate(const 
Instruction *I) {
  !AMDGPU::hasValueInRangeLikeMetadata(*MD, AMDGPUAS::PRIVATE_ADDRESS);
 }
 
+static TargetLowering::AtomicExpansionKind
+getPrivateAtomicExpansionKind(const GCNSubtarget &STI) {
+  // For GAS, lower to flat atomic.
+  return STI.hasGloballyAddressableScratch()
+ ? TargetLowering::AtomicExpansionKind::Expand
+ : TargetLowering::AtomicExpansionKind::NotAtomic;
+}
+
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
   unsigned AS = RMW->getPointerAddressSpace();
   if (AS == AMDGPUAS::PRIVATE_ADDRESS)
-return AtomicExpansionKind::NotAtomic;
+return getPrivateAtomicExpansionKind(*getSubtarget());
 
   // 64-bit flat atomics that dynamically reside in private memory will 
silently
   // be dropped.
@@ -18038,14 +18046,14 @@ 
SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
   return LI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
+ ? getPrivateAtomicExpansionKind(*getSubtarget())
  : AtomicExpansionKind::None;
 }
 
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
   return SI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
+ ? getPrivateAtomicExpansionKind(*getSubtarget())
  : AtomicExpansionKind::None;
 }
 
@@ -18053,7 +18061,7 @@ TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CmpX) const 
{
   unsigned AddrSpace = CmpX->getPointerAddressSpace();
   if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)
-return AtomicExpansionKind::NotAtomic;
+return getPrivateAtomicExpansionKind(*getSubtarget());
 
   if (AddrSpace != AMDGPUAS::FLAT_ADDRESS || !flatInstrMayAccessPrivate(CmpX))
 return AtomicExpansionKind::None;
@@ -18423,9 +18431,24 @@ void 
SITargetLowering::emitExpandAtomicAddrSpacePredicate(
   Builder.CreateBr(ExitBB);
 }
 
+static void convertScratchAtomicToFlatAtomic(Instruction *I,
+ unsigned PtrOpIdx) {
+  Value *PtrOp = I->getOperand(PtrOpIdx);
+  assert(PtrOp->getType()->getPointerAddressSpace() ==
+ AMDGPUAS::PRIVATE_ADDRESS);
+
+  Type *FlatPtr = PointerType::get(I->getContext(), AMDGPUAS::FLAT_ADDRESS);
+  Value *ASCast = CastInst::CreatePointerCast(PtrOp, FlatPtr, "scratch.ascast",
+  I->getIterator());
+  I->setOperand(PtrOpIdx, ASCast);
+}
+
 void SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
   AtomicRMWInst::BinOp Op = AI->getOperation();
 
+  if (AI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFlatAtomic(AI, AI->getPointerOperandIndex());
+
   if (Op == AtomicRMWInst::Sub || Op == AtomicRMWInst::Or ||
   Op == AtomicRMWInst::Xor) {
 if (const auto *ConstVal = dyn_cast(AI->getValOperand());
@@ -18448,9 +18471,28 @@ void 
SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
 }
 
 void SITargetLowering::emitExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) const {
+  if (CI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFlatAtomic(CI, CI->getPointerOperandIndex());
+
   emitExpandAtomicAddrSpacePredicate(CI);
 }
 
+void SITargetLowering::emitExpandAtomicLoad(LoadInst *LI) const {
+  if (LI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFla

[llvm-branch-commits] [llvm] [AMDGPU] Expand scratch atomics to flat atomics if GAS is enabled (PR #154710)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes



---

Patch is 1.02 MiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154710.diff


9 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+46-4) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.h (+2) 
- (modified) llvm/test/CodeGen/AMDGPU/gfx1250-no-scope-cu-stores.ll (-12) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll 
(+3235-504) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll 
(+2892-540) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll 
(+3131-475) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll 
(+2892-540) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll 
(+2938-540) 
- (added) llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-private-gas.ll 
(+172) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 561019bb65549..60faf211df0d9 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -17808,11 +17808,19 @@ static bool flatInstrMayAccessPrivate(const 
Instruction *I) {
  !AMDGPU::hasValueInRangeLikeMetadata(*MD, AMDGPUAS::PRIVATE_ADDRESS);
 }
 
+static TargetLowering::AtomicExpansionKind
+getPrivateAtomicExpansionKind(const GCNSubtarget &STI) {
+  // For GAS, lower to flat atomic.
+  return STI.hasGloballyAddressableScratch()
+ ? TargetLowering::AtomicExpansionKind::Expand
+ : TargetLowering::AtomicExpansionKind::NotAtomic;
+}
+
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
   unsigned AS = RMW->getPointerAddressSpace();
   if (AS == AMDGPUAS::PRIVATE_ADDRESS)
-return AtomicExpansionKind::NotAtomic;
+return getPrivateAtomicExpansionKind(*getSubtarget());
 
   // 64-bit flat atomics that dynamically reside in private memory will 
silently
   // be dropped.
@@ -18038,14 +18046,14 @@ 
SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicLoadInIR(LoadInst *LI) const {
   return LI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
+ ? getPrivateAtomicExpansionKind(*getSubtarget())
  : AtomicExpansionKind::None;
 }
 
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
   return SI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
+ ? getPrivateAtomicExpansionKind(*getSubtarget())
  : AtomicExpansionKind::None;
 }
 
@@ -18053,7 +18061,7 @@ TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CmpX) const 
{
   unsigned AddrSpace = CmpX->getPointerAddressSpace();
   if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)
-return AtomicExpansionKind::NotAtomic;
+return getPrivateAtomicExpansionKind(*getSubtarget());
 
   if (AddrSpace != AMDGPUAS::FLAT_ADDRESS || !flatInstrMayAccessPrivate(CmpX))
 return AtomicExpansionKind::None;
@@ -18423,9 +18431,24 @@ void 
SITargetLowering::emitExpandAtomicAddrSpacePredicate(
   Builder.CreateBr(ExitBB);
 }
 
+static void convertScratchAtomicToFlatAtomic(Instruction *I,
+ unsigned PtrOpIdx) {
+  Value *PtrOp = I->getOperand(PtrOpIdx);
+  assert(PtrOp->getType()->getPointerAddressSpace() ==
+ AMDGPUAS::PRIVATE_ADDRESS);
+
+  Type *FlatPtr = PointerType::get(I->getContext(), AMDGPUAS::FLAT_ADDRESS);
+  Value *ASCast = CastInst::CreatePointerCast(PtrOp, FlatPtr, "scratch.ascast",
+  I->getIterator());
+  I->setOperand(PtrOpIdx, ASCast);
+}
+
 void SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
   AtomicRMWInst::BinOp Op = AI->getOperation();
 
+  if (AI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFlatAtomic(AI, AI->getPointerOperandIndex());
+
   if (Op == AtomicRMWInst::Sub || Op == AtomicRMWInst::Or ||
   Op == AtomicRMWInst::Xor) {
 if (const auto *ConstVal = dyn_cast(AI->getValOperand());
@@ -18448,9 +18471,28 @@ void 
SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
 }
 
 void SITargetLowering::emitExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) const {
+  if (CI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFlatAtomic(CI, CI->getPointerOperandIndex());
+
   emitExpandAtomicAddrSpacePredicate(CI);
 }
 
+void SITargetLowering::emitExpandAtomicLoad(LoadInst *LI) const {
+  if (LI->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS)
+return convertScratchAtomicToFlat

[llvm-branch-commits] [llvm] [AMDGPU] Precommit memory legalizer tests for private AS (PR #154709)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes



---

Patch is 4.65 MiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/154709.diff


5 Files Affected:

- (added) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll (+20976) 
- (added) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-singlethread.ll 
(+21253) 
- (added) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-system.ll (+19992) 
- (added) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-wavefront.ll 
(+21253) 
- (added) llvm/test/CodeGen/AMDGPU/memory-legalizer-private-workgroup.ll 
(+21253) 


``diff
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll 
b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
new file mode 100644
index 0..af5b529fc387e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-private-agent.ll
@@ -0,0 +1,20976 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx600 < %s | FileCheck 
--check-prefixes=GFX6 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx700 < %s | FileCheck 
--check-prefixes=GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1010 < %s | FileCheck 
--check-prefixes=GFX10-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1010 -mattr=+cumode < %s | 
FileCheck --check-prefixes=GFX10-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -O0 -mcpu=gfx700 
-amdgcn-skip-cache-invalidations < %s | FileCheck 
--check-prefixes=SKIP-CACHE-INV %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx90a < %s | FileCheck 
-check-prefixes=GFX90A-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx90a -mattr=+tgsplit < %s | 
FileCheck -check-prefixes=GFX90A-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx942 < %s | FileCheck 
-check-prefixes=GFX942-NOTTGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx942 -mattr=+tgsplit < %s | 
FileCheck -check-prefixes=GFX942-TGSPLIT %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 < %s | FileCheck 
--check-prefixes=GFX11-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1100 -mattr=+cumode < %s | 
FileCheck --check-prefixes=GFX11-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 < %s | FileCheck 
--check-prefixes=GFX12-WGP %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1200 -mattr=+cumode < %s | 
FileCheck --check-prefixes=GFX12-CU %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -O0 -mcpu=gfx1250 < %s | FileCheck 
--check-prefixes=GFX1250 %s
+
+define amdgpu_kernel void @private_agent_unordered_load(
+; GFX6-LABEL: private_agent_unordered_load:
+; GFX6:   ; %bb.0: ; %entry
+; GFX6-NEXT:s_add_u32 s0, s0, s15
+; GFX6-NEXT:s_addc_u32 s1, s1, 0
+; GFX6-NEXT:s_load_dword s5, s[8:9], 0x0
+; GFX6-NEXT:s_load_dword s4, s[8:9], 0x1
+; GFX6-NEXT:s_waitcnt lgkmcnt(0)
+; GFX6-NEXT:v_mov_b32_e32 v0, s5
+; GFX6-NEXT:buffer_load_dword v0, v0, s[0:3], 0 offen
+; GFX6-NEXT:v_mov_b32_e32 v1, s4
+; GFX6-NEXT:s_waitcnt vmcnt(0)
+; GFX6-NEXT:buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX6-NEXT:s_endpgm
+;
+; GFX7-LABEL: private_agent_unordered_load:
+; GFX7:   ; %bb.0: ; %entry
+; GFX7-NEXT:s_add_u32 s0, s0, s17
+; GFX7-NEXT:s_addc_u32 s1, s1, 0
+; GFX7-NEXT:s_load_dword s5, s[8:9], 0x0
+; GFX7-NEXT:s_load_dword s4, s[8:9], 0x1
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s5
+; GFX7-NEXT:buffer_load_dword v0, v0, s[0:3], 0 offen
+; GFX7-NEXT:v_mov_b32_e32 v1, s4
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX7-NEXT:s_endpgm
+;
+; GFX10-WGP-LABEL: private_agent_unordered_load:
+; GFX10-WGP:   ; %bb.0: ; %entry
+; GFX10-WGP-NEXT:s_add_u32 s0, s0, s17
+; GFX10-WGP-NEXT:s_addc_u32 s1, s1, 0
+; GFX10-WGP-NEXT:s_load_dword s5, s[8:9], 0x0
+; GFX10-WGP-NEXT:s_load_dword s4, s[8:9], 0x4
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s5
+; GFX10-WGP-NEXT:buffer_load_dword v0, v0, s[0:3], 0 offen
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s4
+; GFX10-WGP-NEXT:s_waitcnt vmcnt(0)
+; GFX10-WGP-NEXT:buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-WGP-NEXT:s_endpgm
+;
+; GFX10-CU-LABEL: private_agent_unordered_load:
+; GFX10-CU:   ; %bb.0: ; %entry
+; GFX10-CU-NEXT:s_add_u32 s0, s0, s17
+; GFX10-CU-NEXT:s_addc_u32 s1, s1, 0
+; GFX10-CU-NEXT:s_load_dword s5, s[8:9], 0x0
+; GFX10-CU-NEXT:s_load_dword s4, s[8:9], 0x4
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s5
+; GFX10-CU-NEXT:buffer_load_dword v0, v0, s[0:3], 0 offen
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s4
+; GFX10-CU-NEXT:s_waitcnt vmcnt(0)
+; GFX10-CU-NEXT:buffer_store_dword v0, v1, s[0:3], 0 offen
+; GFX10-CU-NEXT:s_endpgm
+;
+; 

[llvm-branch-commits] [llvm] [AMDGPU] Precommit memory legalizer tests for private AS (PR #154709)

2025-08-21 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/154709?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#154710** https://app.graphite.dev/github/pr/llvm/llvm-project/154710?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#154709** https://app.graphite.dev/github/pr/llvm/llvm-project/154709?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/154709?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#154708** https://app.graphite.dev/github/pr/llvm/llvm-project/154708?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/154709
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Expand scratch atomics to flat atomics if GAS is enabled (PR #154710)

2025-08-21 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/154710?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#154710** https://app.graphite.dev/github/pr/llvm/llvm-project/154710?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/154710?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#154709** https://app.graphite.dev/github/pr/llvm/llvm-project/154709?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#154708** https://app.graphite.dev/github/pr/llvm/llvm-project/154708?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/154710
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] 4a878c3 - Revert "[clangd] Add feature modules registry (#153756)"

2025-08-21 Thread via llvm-branch-commits

Author: Aleksandr Platonov
Date: 2025-08-21T12:32:04+03:00
New Revision: 4a878c35d8a033dff694889c6f8218e1f0837307

URL: 
https://github.com/llvm/llvm-project/commit/4a878c35d8a033dff694889c6f8218e1f0837307
DIFF: 
https://github.com/llvm/llvm-project/commit/4a878c35d8a033dff694889c6f8218e1f0837307.diff

LOG: Revert "[clangd] Add feature modules registry (#153756)"

This reverts commit ff5767a02c878070bea35a667301ca66082cf400.

Added: 


Modified: 
clang-tools-extra/clangd/FeatureModule.cpp
clang-tools-extra/clangd/FeatureModule.h
clang-tools-extra/clangd/tool/ClangdMain.cpp

Removed: 




diff  --git a/clang-tools-extra/clangd/FeatureModule.cpp 
b/clang-tools-extra/clangd/FeatureModule.cpp
index b6d700134919d..872cea1443789 100644
--- a/clang-tools-extra/clangd/FeatureModule.cpp
+++ b/clang-tools-extra/clangd/FeatureModule.cpp
@@ -22,10 +22,6 @@ FeatureModule::Facilities &FeatureModule::facilities() {
   return *Fac;
 }
 
-void FeatureModuleSet::add(std::unique_ptr M) {
-  Modules.push_back(std::move(M));
-}
-
 bool FeatureModuleSet::addImpl(void *Key, std::unique_ptr M,
const char *Source) {
   if (!Map.try_emplace(Key, M.get()).second) {
@@ -39,5 +35,3 @@ bool FeatureModuleSet::addImpl(void *Key, 
std::unique_ptr M,
 
 } // namespace clangd
 } // namespace clang
-
-LLVM_INSTANTIATE_REGISTRY(clang::clangd::FeatureModuleRegistry)

diff  --git a/clang-tools-extra/clangd/FeatureModule.h 
b/clang-tools-extra/clangd/FeatureModule.h
index 075db954a606a..7b6883507be3f 100644
--- a/clang-tools-extra/clangd/FeatureModule.h
+++ b/clang-tools-extra/clangd/FeatureModule.h
@@ -15,7 +15,6 @@
 #include "llvm/ADT/FunctionExtras.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/JSON.h"
-#include "llvm/Support/Registry.h"
 #include 
 #include 
 #include 
@@ -144,14 +143,9 @@ class FeatureModule {
 
 /// A FeatureModuleSet is a collection of feature modules installed in clangd.
 ///
-/// Modules added with explicit type specification can be looked up by type, or
-/// used via the FeatureModule interface. This allows individual modules to
-/// expose a public API. For this reason, there can be only one feature module
-/// of each type.
-///
-/// Modules added using a base class pointer can be used only via the
-/// FeatureModule interface and can't be looked up by type, thus custom public
-/// API (if provided by the module) can't be used.
+/// Modules can be looked up by type, or used via the FeatureModule interface.
+/// This allows individual modules to expose a public API.
+/// For this reason, there can be only one feature module of each type.
 ///
 /// The set owns the modules. It is itself owned by main, not ClangdServer.
 class FeatureModuleSet {
@@ -178,7 +172,6 @@ class FeatureModuleSet {
   const_iterator begin() const { return const_iterator(Modules.begin()); }
   const_iterator end() const { return const_iterator(Modules.end()); }
 
-  void add(std::unique_ptr M);
   template  bool add(std::unique_ptr M) {
 return addImpl(&ID::Key, std::move(M), LLVM_PRETTY_FUNCTION);
   }
@@ -192,8 +185,6 @@ class FeatureModuleSet {
 
 template  int FeatureModuleSet::ID::Key;
 
-using FeatureModuleRegistry = llvm::Registry;
-
 } // namespace clangd
 } // namespace clang
 #endif

diff  --git a/clang-tools-extra/clangd/tool/ClangdMain.cpp 
b/clang-tools-extra/clangd/tool/ClangdMain.cpp
index 827233dd6486c..f287439f10cab 100644
--- a/clang-tools-extra/clangd/tool/ClangdMain.cpp
+++ b/clang-tools-extra/clangd/tool/ClangdMain.cpp
@@ -1017,14 +1017,6 @@ clangd accepts flags on the commandline, and in the 
CLANGD_FLAGS environment var
: static_cast(ErrorResultCode::CheckFailed);
   }
 
-  FeatureModuleSet ModuleSet;
-  for (FeatureModuleRegistry::entry E : FeatureModuleRegistry::entries()) {
-vlog("Adding feature module '{0}' ({1})", E.getName(), E.getDesc());
-ModuleSet.add(E.instantiate());
-  }
-  if (ModuleSet.begin() != ModuleSet.end())
-Opts.FeatureModules = &ModuleSet;
-
   // Initialize and run ClangdLSPServer.
   // Change stdin to binary to not lose \r\n on windows.
   llvm::sys::ChangeStdinToBinary();



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit memory legalizer tests for private AS (PR #154709)

2025-08-21 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/154709
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Expand scratch atomics to flat atomics if GAS is enabled (PR #154710)

2025-08-21 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/154710
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 2e6126e - Merge branch 'main' into revert-153756-features-registry

2025-08-21 Thread via llvm-branch-commits

Author: Aleksandr Platonov
Date: 2025-08-21T12:36:28+03:00
New Revision: 2e6126e46f3ee10fc3221df97a609a4baba98226

URL: 
https://github.com/llvm/llvm-project/commit/2e6126e46f3ee10fc3221df97a609a4baba98226
DIFF: 
https://github.com/llvm/llvm-project/commit/2e6126e46f3ee10fc3221df97a609a4baba98226.diff

LOG: Merge branch 'main' into revert-153756-features-registry

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h 
b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 9f036fbd569b6..18ab7ddb425ab 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -361,6 +361,12 @@ m_c_Binary(const Op0_t &Op0, const Op1_t &Op1) {
   return AllRecipe_commutative_match(Op0, Op1);
 }
 
+template 
+inline AllRecipe_match m_Sub(const Op0_t &Op0,
+ const Op1_t &Op1) 
{
+  return m_Binary(Op0, Op1);
+}
+
 template 
 inline AllRecipe_match m_Mul(const Op0_t &Op0,
  const Op1_t &Op1) 
{

diff  --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 1641eb0776ecf..89214b410fab4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -326,8 +326,7 @@ VPPartialReductionRecipe::computeCost(ElementCount VF,
   // Pick out opcode, type/ext information and use sub side effects from a 
widen
   // recipe.
   auto HandleWiden = [&](VPWidenRecipe *Widen) {
-if (match(Widen,
-  m_Binary(m_SpecificInt(0), m_VPValue(Op {
+if (match(Widen, m_Sub(m_SpecificInt(0), m_VPValue(Op {
   Widen = dyn_cast(Op->getDefiningRecipe());
 }
 Opcode = Widen->getOpcode();

diff  --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index dbbead3bb782e..b25fc0af1fb51 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -753,8 +753,7 @@ static VPWidenInductionRecipe *getOptimizableIVOf(VPValue 
*VPV) {
   // IVStep will be the negated step of the subtraction. Check if Step == 
-1
   // * IVStep.
   VPValue *Step;
-  if (!match(VPV,
- m_Binary(m_VPValue(), m_VPValue(Step))) ||
+  if (!match(VPV, m_Sub(m_VPValue(), m_VPValue(Step))) ||
   !Step->isLiveIn() || !IVStep->isLiveIn())
 return false;
   auto *StepCI = dyn_cast(Step->getLiveInIRValue());



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 955c475 - [VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705)

2025-08-21 Thread via llvm-branch-commits

Author: Luke Lau
Date: 2025-08-21T09:33:46Z
New Revision: 955c475ae6622cb730ed7e75fcdefa115aaba858

URL: 
https://github.com/llvm/llvm-project/commit/955c475ae6622cb730ed7e75fcdefa115aaba858
DIFF: 
https://github.com/llvm/llvm-project/commit/955c475ae6622cb730ed7e75fcdefa115aaba858.diff

LOG: [VPlan] Add m_Sub to VPlanPatternMatch. NFC (#154705)

To mirror PatternMatch.h, and we'll also be able to use it in #152167

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h 
b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 9f036fbd569b6..18ab7ddb425ab 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -361,6 +361,12 @@ m_c_Binary(const Op0_t &Op0, const Op1_t &Op1) {
   return AllRecipe_commutative_match(Op0, Op1);
 }
 
+template 
+inline AllRecipe_match m_Sub(const Op0_t &Op0,
+ const Op1_t &Op1) 
{
+  return m_Binary(Op0, Op1);
+}
+
 template 
 inline AllRecipe_match m_Mul(const Op0_t &Op0,
  const Op1_t &Op1) 
{

diff  --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 1641eb0776ecf..89214b410fab4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -326,8 +326,7 @@ VPPartialReductionRecipe::computeCost(ElementCount VF,
   // Pick out opcode, type/ext information and use sub side effects from a 
widen
   // recipe.
   auto HandleWiden = [&](VPWidenRecipe *Widen) {
-if (match(Widen,
-  m_Binary(m_SpecificInt(0), m_VPValue(Op {
+if (match(Widen, m_Sub(m_SpecificInt(0), m_VPValue(Op {
   Widen = dyn_cast(Op->getDefiningRecipe());
 }
 Opcode = Widen->getOpcode();

diff  --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp 
b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index dbbead3bb782e..b25fc0af1fb51 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -753,8 +753,7 @@ static VPWidenInductionRecipe *getOptimizableIVOf(VPValue 
*VPV) {
   // IVStep will be the negated step of the subtraction. Check if Step == 
-1
   // * IVStep.
   VPValue *Step;
-  if (!match(VPV,
- m_Binary(m_VPValue(), m_VPValue(Step))) ||
+  if (!match(VPV, m_Sub(m_VPValue(), m_VPValue(Step))) ||
   !Step->isLiveIn() || !IVStep->isLiveIn())
 return false;
   auto *StepCI = dyn_cast(Step->getLiveInIRValue());



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Gábor Horváth via llvm-branch-commits

Xazax-hun wrote:

Looks good to me. Are all of those crashes present in previously released 
stable versions?

https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [analyzer][docs] CSA release notes for clang-21 (PR #154600)

2025-08-21 Thread Gábor Horváth via llvm-branch-commits

https://github.com/Xazax-hun approved this pull request.


https://github.com/llvm/llvm-project/pull/154600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-08-21 Thread via llvm-branch-commits

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/150170

>From 34355ed73cdf9eb09f8e867df39c07feae88cb7e Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def |  25 ++
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |  58 +++
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl  | 378 +++
 3 files changed, 461 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index f8f55772db8fe..77344b999dd84 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -361,6 +361,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
 BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
 BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
 
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
 
//===--===//
 // R600-NI only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index dad1f95ac710d..7471dc1bb3d50 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+  switch (BuiltinID) {
+  default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+  }
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
   llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
   llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u

[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)

2025-08-21 Thread via llvm-branch-commits

https://github.com/easyonaadit updated 
https://github.com/llvm/llvm-project/pull/150170

>From 34355ed73cdf9eb09f8e867df39c07feae88cb7e Mon Sep 17 00:00:00 2001
From: Aaditya 
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics

---
 clang/include/clang/Basic/BuiltinsAMDGPU.def |  25 ++
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp  |  58 +++
 clang/test/CodeGenOpenCL/builtins-amdgcn.cl  | 378 +++
 3 files changed, 461 insertions(+)

diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def 
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index f8f55772db8fe..77344b999dd84 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -361,6 +361,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
 BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
 BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
 
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
 
//===--===//
 // R600-NI only builtins.
 
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp 
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index dad1f95ac710d..7471dc1bb3d50 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+  switch (BuiltinID) {
+  default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+  case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+  }
+}
+
 Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   const CallExpr *E) {
   llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
   llvm::SyncScope::ID SSID;
   switch (BuiltinID) {
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+  case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u

[llvm-branch-commits] [llvm] 44d2ee0 - Revert "[SDAG[[X86] Added method to scalarize `STRICT_FSETCC` (#154486)"

2025-08-21 Thread via llvm-branch-commits

Author: Abhishek Kaushik
Date: 2025-08-21T13:08:03+05:30
New Revision: 44d2ee0080ef4be770bc351e15e89665971654b5

URL: 
https://github.com/llvm/llvm-project/commit/44d2ee0080ef4be770bc351e15e89665971654b5
DIFF: 
https://github.com/llvm/llvm-project/commit/44d2ee0080ef4be770bc351e15e89665971654b5.diff

LOG: Revert "[SDAG[[X86] Added method to scalarize `STRICT_FSETCC` (#154486)"

This reverts commit 62aaa96d6f23acdaf7baaec98f03c9525c4189ee.

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Removed: 
llvm/test/CodeGen/X86/fp80-strict-vec-cmp.ll



diff  --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index 65fd863e55ac9..33fa3012618b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -909,7 +909,6 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
   SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
   SDValue ScalarizeVecOp_VSELECT(SDNode *N);
   SDValue ScalarizeVecOp_VSETCC(SDNode *N);
-  SDValue ScalarizeVecOp_VSTRICT_FSETCC(SDNode *N, unsigned OpNo);
   SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo);
   SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo);
   SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo);

diff  --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 10e3a5149a5dc..125bd54397935 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -789,10 +789,6 @@ bool DAGTypeLegalizer::ScalarizeVectorOperand(SDNode *N, 
unsigned OpNo) {
   case ISD::SETCC:
 Res = ScalarizeVecOp_VSETCC(N);
 break;
-  case ISD::STRICT_FSETCC:
-  case ISD::STRICT_FSETCCS:
-Res = ScalarizeVecOp_VSTRICT_FSETCC(N, OpNo);
-break;
   case ISD::STORE:
 Res = ScalarizeVecOp_STORE(cast(N), OpNo);
 break;
@@ -989,43 +985,6 @@ SDValue DAGTypeLegalizer::ScalarizeVecOp_VSETCC(SDNode *N) 
{
   return DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VT, Res);
 }
 
-// Similiar to ScalarizeVecOp_VSETCC, with added logic to update chains.
-SDValue DAGTypeLegalizer::ScalarizeVecOp_VSTRICT_FSETCC(SDNode *N,
-unsigned OpNo) {
-  assert(OpNo == 1 && "Wrong operand for scalarization!");
-  assert(N->getValueType(0).isVector() &&
- N->getOperand(1).getValueType().isVector() &&
- "Operand types must be vectors");
-  assert(N->getValueType(0) == MVT::v1i1 && "Expected v1i1 type");
-
-  EVT VT = N->getValueType(0);
-  SDValue Ch = N->getOperand(0);
-  SDValue LHS = GetScalarizedVector(N->getOperand(1));
-  SDValue RHS = GetScalarizedVector(N->getOperand(2));
-  SDValue CC = N->getOperand(3);
-
-  EVT OpVT = N->getOperand(1).getValueType();
-  EVT NVT = VT.getVectorElementType();
-  SDLoc DL(N);
-  SDValue Res = DAG.getNode(N->getOpcode(), DL, {MVT::i1, MVT::Other},
-{Ch, LHS, RHS, CC});
-
-  // Legalize the chain result - switch anything that used the old chain to
-  // use the new one.
-  ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
-
-  ISD::NodeType ExtendCode =
-  TargetLowering::getExtendForContent(TLI.getBooleanContents(OpVT));
-
-  Res = DAG.getNode(ExtendCode, DL, NVT, Res);
-  Res = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VT, Res);
-
-  // Do our own replacement and return SDValue() to tell the caller that we
-  // handled all replacements since caller can only handle a single result.
-  ReplaceValueWith(SDValue(N, 0), Res);
-  return SDValue();
-}
-
 /// If the value to store is a vector that needs to be scalarized, it must be
 /// <1 x ty>. Just store the element.
 SDValue DAGTypeLegalizer::ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo){

diff  --git a/llvm/test/CodeGen/X86/fp80-strict-vec-cmp.ll 
b/llvm/test/CodeGen/X86/fp80-strict-vec-cmp.ll
deleted file mode 100644
index b4c77a573e859..0
--- a/llvm/test/CodeGen/X86/fp80-strict-vec-cmp.ll
+++ /dev/null
@@ -1,293 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
-; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f | FileCheck %s
-
-define <1 x i1> @test_oeq_q_v1f64(<1 x double> %a, <1 x double> %b) {
-; CHECK-LABEL: test_oeq_q_v1f64:
-; CHECK:   # %bb.0:
-; CHECK-NEXT:vucomisd %xmm1, %xmm0
-; CHECK-NEXT:setnp %cl
-; CHECK-NEXT:sete %al
-; CHECK-NEXT:andb %cl, %al
-; CHECK-NEXT:retq
-  %cond = tail call <1 x i1> @llvm.experimental.constrained.fcmp.v1f64(<1 x 
double> %a, <1 x double> %b, metadata !"oeq", metadata !"fpexcept.strict")
-  ret <1 x i1> %cond
-}
-
-define <1 x i1> @test_ogt_q_v1f64(<1 x double> %a, <1 x double> %b) {
-; CHECK-LABEL: test_ogt_q_v1f64:
-; CHECK:   # %bb.0:

[llvm-branch-commits] [clang] release/21.x: [NFC][Clang][Docs] Update Pointer Authentication documentation (#152596) (PR #154240)

2025-08-21 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

@EugeneZelenko @tru

Once the last of the functional changes is merged I'll do a last sweep through 
the docs and release notes to ensure that the release notes and documentation 
are complete

https://github.com/llvm/llvm-project/pull/154240
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal requested changes to this pull request.

The `A new flag - `-static-libclosure`...` paragraph is still mentioned in the 
Static Analyzer New Features section.
I think the conflict resolution was wrong here.
I'll have a look later today.

https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Oliver Hunt via llvm-branch-commits

ojhunt wrote:

Dammit, yeah I wish I could still see the conflict to work out how I misread it 
- I suspect the addition of  "A new flag - `-static-libclosure`..." to new 
flags didn't conflict :-/


https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Move rest of documentation problems that found their way to the SA sec. (PR #154608)

2025-08-21 Thread Balazs Benics via llvm-branch-commits

steakhal wrote:

> Dammit, yeah I wish I could still see the conflict to work out how I misread 
> it - I suspect the addition of "A new flag - `-static-libclosure`..." to new 
> flags didn't conflict :-/

You can actually, `git show --remerge-diff 
30401b1f918ea359334b507a79118938ffe3c169` 
([docs](https://git-scm.com/docs/git-show#Documentation/git-show.txt---remerge-diff)).
 :)

https://github.com/llvm/llvm-project/pull/154608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][openmp] Add parser/semantic support for workdistribute (PR #154377)

2025-08-21 Thread via llvm-branch-commits


@@ -141,6 +141,67 @@ class OmpWorkshareBlockChecker {
   parser::CharBlock source_;
 };
 
+// 'OmpWorkdistributeBlockChecker' is used to check the validity of the
+// assignment statements and the expressions enclosed in an OpenMP
+// workdistribute construct
+class OmpWorkdistributeBlockChecker {
+public:
+  OmpWorkdistributeBlockChecker(
+  SemanticsContext &context, parser::CharBlock source)
+  : context_{context}, source_{source} {}
+
+  template  bool Pre(const T &) { return true; }
+  template  void Post(const T &) {}
+
+  bool Pre(const parser::AssignmentStmt &assignment) {
+const auto &var{std::get(assignment.t)};
+const auto &expr{std::get(assignment.t)};
+const auto *lhs{GetExpr(context_, var)};
+const auto *rhs{GetExpr(context_, expr)};
+if (lhs && rhs) {
+  Tristate isDefined{semantics::IsDefinedAssignment(
+  lhs->GetType(), lhs->Rank(), rhs->GetType(), rhs->Rank())};
+  if (isDefined == Tristate::Yes) {
+context_.Say(expr.source,
+"Defined assignment statement is not "
+"allowed in a WORKDISTRIBUTE construct"_err_en_US);
+  }
+}
+return true;
+  }
+
+  bool Pre(const parser::Expr &expr) {
+if (const auto *e{GetExpr(context_, expr)}) {

skc7 wrote:

fixed it in latest patch.

https://github.com/llvm/llvm-project/pull/154377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][openmp] Add parser/semantic support for workdistribute (PR #154377)

2025-08-21 Thread via llvm-branch-commits


@@ -141,6 +141,67 @@ class OmpWorkshareBlockChecker {
   parser::CharBlock source_;
 };
 
+// 'OmpWorkdistributeBlockChecker' is used to check the validity of the
+// assignment statements and the expressions enclosed in an OpenMP
+// workdistribute construct
+class OmpWorkdistributeBlockChecker {
+public:
+  OmpWorkdistributeBlockChecker(
+  SemanticsContext &context, parser::CharBlock source)
+  : context_{context}, source_{source} {}
+
+  template  bool Pre(const T &) { return true; }
+  template  void Post(const T &) {}
+
+  bool Pre(const parser::AssignmentStmt &assignment) {
+const auto &var{std::get(assignment.t)};
+const auto &expr{std::get(assignment.t)};
+const auto *lhs{GetExpr(context_, var)};
+const auto *rhs{GetExpr(context_, expr)};
+if (lhs && rhs) {
+  Tristate isDefined{semantics::IsDefinedAssignment(
+  lhs->GetType(), lhs->Rank(), rhs->GetType(), rhs->Rank())};
+  if (isDefined == Tristate::Yes) {
+context_.Say(expr.source,
+"Defined assignment statement is not "
+"allowed in a WORKDISTRIBUTE construct"_err_en_US);
+  }
+}
+return true;
+  }
+
+  bool Pre(const parser::Expr &expr) {
+if (const auto *e{GetExpr(context_, expr)}) {
+  for (const Symbol &symbol : evaluate::CollectSymbols(*e)) {
+const Symbol &root{GetAssociationRoot(symbol)};
+if (IsFunction(root)) {
+  std::string attrs{""};

skc7 wrote:

Thanks for the feedback. Fixed it.

https://github.com/llvm/llvm-project/pull/154377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][openmp] Add parser/semantic support for workdistribute (PR #154377)

2025-08-21 Thread via llvm-branch-commits


@@ -4497,6 +4576,22 @@ void OmpStructureChecker::CheckWorkshareBlockStmts(
   }
 }
 
+void OmpStructureChecker::CheckWorkdistributeBlockStmts(
+const parser::Block &block, parser::CharBlock source) {
+  OmpWorkdistributeBlockChecker ompWorkdistributeBlockChecker{context_, 
source};
+
+  for (auto it{block.begin()}; it != block.end(); ++it) {
+if (parser::Unwrap(*it)) {
+  parser::Walk(*it, ompWorkdistributeBlockChecker);
+} else {
+  context_.Say(source,
+  "The structured block in a WORKDISTRIBUTE construct may consist of "
+  "only "
+  "SCALAR or ARRAY assignments"_err_en_US);

skc7 wrote:

Fixed all the error strings concatenations. Also added check for version 6.0 
for WORKDISTRIBUTE.

https://github.com/llvm/llvm-project/pull/154377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][openmp] Add parser/semantic support for workdistribute (PR #154377)

2025-08-21 Thread via llvm-branch-commits

https://github.com/skc7 updated https://github.com/llvm/llvm-project/pull/154377

>From 4442fced8216bf8e26522e2b4b127e4cfc40 Mon Sep 17 00:00:00 2001
From: skc7 
Date: Tue, 19 Aug 2025 21:43:06 +0530
Subject: [PATCH 1/2] [flang][openmp] Add parser/semantic support for
 workdistribute

---
 .../flang/Semantics/openmp-directive-sets.h   |  7 ++
 flang/lib/Parser/openmp-parsers.cpp   |  6 +-
 flang/lib/Semantics/check-omp-structure.cpp   | 95 +++
 flang/lib/Semantics/check-omp-structure.h |  1 +
 flang/lib/Semantics/resolve-directives.cpp|  8 +-
 flang/test/Parser/OpenMP/workdistribute.f90   | 27 ++
 .../Semantics/OpenMP/workdistribute01.f90 | 16 
 .../Semantics/OpenMP/workdistribute02.f90 | 34 +++
 .../Semantics/OpenMP/workdistribute03.f90 | 34 +++
 9 files changed, 226 insertions(+), 2 deletions(-)
 create mode 100644 flang/test/Parser/OpenMP/workdistribute.f90
 create mode 100644 flang/test/Semantics/OpenMP/workdistribute01.f90
 create mode 100644 flang/test/Semantics/OpenMP/workdistribute02.f90
 create mode 100644 flang/test/Semantics/OpenMP/workdistribute03.f90

diff --git a/flang/include/flang/Semantics/openmp-directive-sets.h 
b/flang/include/flang/Semantics/openmp-directive-sets.h
index cc66cc833e8b7..01e8481e05721 100644
--- a/flang/include/flang/Semantics/openmp-directive-sets.h
+++ b/flang/include/flang/Semantics/openmp-directive-sets.h
@@ -143,6 +143,7 @@ static const OmpDirectiveSet topTargetSet{
 Directive::OMPD_target_teams_distribute_parallel_do_simd,
 Directive::OMPD_target_teams_distribute_simd,
 Directive::OMPD_target_teams_loop,
+Directive::OMPD_target_teams_workdistribute,
 };
 
 static const OmpDirectiveSet allTargetSet{topTargetSet};
@@ -172,6 +173,7 @@ static const OmpDirectiveSet topTeamsSet{
 Directive::OMPD_teams_distribute_parallel_do_simd,
 Directive::OMPD_teams_distribute_simd,
 Directive::OMPD_teams_loop,
+Directive::OMPD_teams_workdistribute,
 };
 
 static const OmpDirectiveSet bottomTeamsSet{
@@ -187,6 +189,7 @@ static const OmpDirectiveSet allTeamsSet{
 Directive::OMPD_target_teams_distribute_parallel_do_simd,
 Directive::OMPD_target_teams_distribute_simd,
 Directive::OMPD_target_teams_loop,
+Directive::OMPD_target_teams_workdistribute,
 } | topTeamsSet,
 };
 
@@ -230,6 +233,9 @@ static const OmpDirectiveSet blockConstructSet{
 Directive::OMPD_taskgroup,
 Directive::OMPD_teams,
 Directive::OMPD_workshare,
+Directive::OMPD_target_teams_workdistribute,
+Directive::OMPD_teams_workdistribute,
+Directive::OMPD_workdistribute,
 };
 
 static const OmpDirectiveSet loopConstructSet{
@@ -376,6 +382,7 @@ static const OmpDirectiveSet 
nestedReduceWorkshareAllowedSet{
 };
 
 static const OmpDirectiveSet nestedTeamsAllowedSet{
+Directive::OMPD_workdistribute,
 Directive::OMPD_distribute,
 Directive::OMPD_distribute_parallel_do,
 Directive::OMPD_distribute_parallel_do_simd,
diff --git a/flang/lib/Parser/openmp-parsers.cpp 
b/flang/lib/Parser/openmp-parsers.cpp
index 56cee4ab38e9b..51b49a591b02f 100644
--- a/flang/lib/Parser/openmp-parsers.cpp
+++ b/flang/lib/Parser/openmp-parsers.cpp
@@ -1870,11 +1870,15 @@ TYPE_PARSER( //
 MakeBlockConstruct(llvm::omp::Directive::OMPD_target_data) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_target_parallel) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_target_teams) ||
+MakeBlockConstruct(
+llvm::omp::Directive::OMPD_target_teams_workdistribute) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_target) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_task) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_taskgroup) ||
 MakeBlockConstruct(llvm::omp::Directive::OMPD_teams) ||
-MakeBlockConstruct(llvm::omp::Directive::OMPD_workshare))
+MakeBlockConstruct(llvm::omp::Directive::OMPD_teams_workdistribute) ||
+MakeBlockConstruct(llvm::omp::Directive::OMPD_workshare) ||
+MakeBlockConstruct(llvm::omp::Directive::OMPD_workdistribute))
 #undef MakeBlockConstruct
 
 // OMP SECTIONS Directive
diff --git a/flang/lib/Semantics/check-omp-structure.cpp 
b/flang/lib/Semantics/check-omp-structure.cpp
index 2b36b085ae08d..4c4e17c39c03a 100644
--- a/flang/lib/Semantics/check-omp-structure.cpp
+++ b/flang/lib/Semantics/check-omp-structure.cpp
@@ -141,6 +141,67 @@ class OmpWorkshareBlockChecker {
   parser::CharBlock source_;
 };
 
+// 'OmpWorkdistributeBlockChecker' is used to check the validity of the
+// assignment statements and the expressions enclosed in an OpenMP
+// workdistribute construct
+class OmpWorkdistributeBlockChecker {
+public:
+  OmpWorkdistributeBlockChecker(
+  SemanticsContext &context, parser::CharBlock source)
+  : context_{context}, source_{source} {}
+
+  template  bool Pre(const T &) { return true; }
+  template  void Post(const T &) {}
+
+  bool Pre(const parser::AssignmentStmt &assignment) {
+const aut

[llvm-branch-commits] [clang] release/21.x: [Driver] DragonFly does not support C11 threads (#154886) (PR #154897)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 0fff4605922d137252875f072b3fb2973dbf9693

Requested by: @brad0

---
Full diff: https://github.com/llvm/llvm-project/pull/154897.diff


2 Files Affected:

- (modified) clang/lib/Basic/Targets/OSTargets.h (+3) 
- (modified) clang/test/Preprocessor/init.c (+8) 


``diff
diff --git a/clang/lib/Basic/Targets/OSTargets.h 
b/clang/lib/Basic/Targets/OSTargets.h
index 97b2caa22d8e4..c1a68f464e831 100644
--- a/clang/lib/Basic/Targets/OSTargets.h
+++ b/clang/lib/Basic/Targets/OSTargets.h
@@ -174,6 +174,9 @@ class LLVM_LIBRARY_VISIBILITY DragonFlyBSDTargetInfo
 DefineStd(Builder, "unix", Opts);
 if (this->HasFloat128)
   Builder.defineMacro("__FLOAT128__");
+
+if (Opts.C11)
+  Builder.defineMacro("__STDC_NO_THREADS__");
   }
 
 public:
diff --git a/clang/test/Preprocessor/init.c b/clang/test/Preprocessor/init.c
index bed39dc3e34dc..7e0df96141364 100644
--- a/clang/test/Preprocessor/init.c
+++ b/clang/test/Preprocessor/init.c
@@ -1622,6 +1622,14 @@
 // RUN: %clang_cc1 -x c -std=c99 -E -dM -ffreestanding 
-triple=amd64-unknown-openbsd < /dev/null | FileCheck -match-full-lines 
-check-prefix OPENBSD-STDC-N %s
 // OPENBSD-STDC-N-NOT:#define __STDC_NO_THREADS__ 1
 //
+// RUN: %clang_cc1 -x c -std=c11 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// RUN: %clang_cc1 -x c -std=gnu11 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// RUN: %clang_cc1 -x c -std=c17 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// DRAGONFLY-STDC:#define __STDC_NO_THREADS__ 1
+//
+// RUN: %clang_cc1 -x c -std=c99 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC-N %s
+// DRAGONFLY-STDC-N-NOT:#define __STDC_NO_THREADS__ 1
+//
 // RUN: %clang_cc1 -triple=aarch64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s
 // RUN: %clang_cc1 -triple=riscv64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s
 // RUN: %clang_cc1 -triple=x86_64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s

``




https://github.com/llvm/llvm-project/pull/154897
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Driver] DragonFly does not support C11 threads (#154886) (PR #154897)

2025-08-21 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/154897
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Driver] DragonFly does not support C11 threads (#154886) (PR #154897)

2025-08-21 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/154897

Backport 0fff4605922d137252875f072b3fb2973dbf9693

Requested by: @brad0

>From 0fe516b9350e5a08ee24a2be318d0971c1b2b0b3 Mon Sep 17 00:00:00 2001
From: Brad Smith 
Date: Fri, 22 Aug 2025 02:02:52 -0400
Subject: [PATCH] [Driver] DragonFly does not support C11 threads (#154886)

(cherry picked from commit 0fff4605922d137252875f072b3fb2973dbf9693)
---
 clang/lib/Basic/Targets/OSTargets.h | 3 +++
 clang/test/Preprocessor/init.c  | 8 
 2 files changed, 11 insertions(+)

diff --git a/clang/lib/Basic/Targets/OSTargets.h 
b/clang/lib/Basic/Targets/OSTargets.h
index 97b2caa22d8e4..c1a68f464e831 100644
--- a/clang/lib/Basic/Targets/OSTargets.h
+++ b/clang/lib/Basic/Targets/OSTargets.h
@@ -174,6 +174,9 @@ class LLVM_LIBRARY_VISIBILITY DragonFlyBSDTargetInfo
 DefineStd(Builder, "unix", Opts);
 if (this->HasFloat128)
   Builder.defineMacro("__FLOAT128__");
+
+if (Opts.C11)
+  Builder.defineMacro("__STDC_NO_THREADS__");
   }
 
 public:
diff --git a/clang/test/Preprocessor/init.c b/clang/test/Preprocessor/init.c
index bed39dc3e34dc..7e0df96141364 100644
--- a/clang/test/Preprocessor/init.c
+++ b/clang/test/Preprocessor/init.c
@@ -1622,6 +1622,14 @@
 // RUN: %clang_cc1 -x c -std=c99 -E -dM -ffreestanding 
-triple=amd64-unknown-openbsd < /dev/null | FileCheck -match-full-lines 
-check-prefix OPENBSD-STDC-N %s
 // OPENBSD-STDC-N-NOT:#define __STDC_NO_THREADS__ 1
 //
+// RUN: %clang_cc1 -x c -std=c11 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// RUN: %clang_cc1 -x c -std=gnu11 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// RUN: %clang_cc1 -x c -std=c17 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC %s
+// DRAGONFLY-STDC:#define __STDC_NO_THREADS__ 1
+//
+// RUN: %clang_cc1 -x c -std=c99 -E -dM -ffreestanding 
-triple=x86_64-unknown-dragonfly < /dev/null | FileCheck -match-full-lines 
-check-prefix DRAGONFLY-STDC-N %s
+// DRAGONFLY-STDC-N-NOT:#define __STDC_NO_THREADS__ 1
+//
 // RUN: %clang_cc1 -triple=aarch64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s
 // RUN: %clang_cc1 -triple=riscv64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s
 // RUN: %clang_cc1 -triple=x86_64-unknown-managarm-mlibc -E -dM < /dev/null | 
FileCheck -match-full-lines -check-prefix MANAGARM %s

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/21.x: [Driver] DragonFly does not support C11 threads (#154886) (PR #154897)

2025-08-21 Thread via llvm-branch-commits

llvmbot wrote:

@devnexen What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/154897
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits