from:"Sam Tebbs via llvm\-branch\-commits"

[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 created 
https://github.com/llvm/llvm-project/pull/107210

None

>From 8296e727435492d4a5b49deea76c098d6f54081f Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 4 Sep 2024 11:05:17 +0100
Subject: [PATCH] Add frontend for search

---
 graphite-demo/frontend.jsx | 56 ++
 1 file changed, 56 insertions(+)
 create mode 100644 graphite-demo/frontend.jsx

diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx
new file mode 100644
index 00..dd6a2a3ba66cc5
--- /dev/null
+++ b/graphite-demo/frontend.jsx
@@ -0,0 +1,56 @@
+import React, { useEffect, useState } from 'react';
+
+const TaskSearch = () => {
+  const [tasks, setTasks] = useState([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState(null);
+  const [searchQuery, setSearchQuery] = useState('');
+
+  useEffect(() => {
+setLoading(true);
+fetch(`/search?query=${encodeURIComponent(searchQuery)}`)
+  .then(response => {
+if (!response.ok) {
+  throw new Error('Network response was not ok');
+}
+return response.json();
+  })
+  .then(data => {
+setTasks(data);
+setLoading(false);
+  })
+  .catch(error => {
+setError(error.message);
+setLoading(false);
+  });
+  }, [searchQuery]); // Depend on searchQuery
+
+  if (loading) {
+return Loading...;
+  }
+
+  if (error) {
+return Error: {error};
+  }
+
+  return (
+
+  Task Search
+   setSearchQuery(e.target.value)}
+  />
+  
+{tasks.map(task => (
+  
+{task.description}
+  
+))}
+  
+
+  );
+};
+
+export default TaskSearch;
\ No newline at end of file

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#107210** https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#107209** https://app.graphite.dev/github/pr/llvm/llvm-project/107209?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @SamTebbs33 and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/107210
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/107210

>From 4dae516fc2be004f79362b455b835754eeda953d Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 4 Sep 2024 11:05:17 +0100
Subject: [PATCH] Add frontend for search

---
 graphite-demo/frontend.jsx | 56 ++
 1 file changed, 56 insertions(+)
 create mode 100644 graphite-demo/frontend.jsx

diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx
new file mode 100644
index 00..dd6a2a3ba66cc5
--- /dev/null
+++ b/graphite-demo/frontend.jsx
@@ -0,0 +1,56 @@
+import React, { useEffect, useState } from 'react';
+
+const TaskSearch = () => {
+  const [tasks, setTasks] = useState([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState(null);
+  const [searchQuery, setSearchQuery] = useState('');
+
+  useEffect(() => {
+setLoading(true);
+fetch(`/search?query=${encodeURIComponent(searchQuery)}`)
+  .then(response => {
+if (!response.ok) {
+  throw new Error('Network response was not ok');
+}
+return response.json();
+  })
+  .then(data => {
+setTasks(data);
+setLoading(false);
+  })
+  .catch(error => {
+setError(error.message);
+setLoading(false);
+  });
+  }, [searchQuery]); // Depend on searchQuery
+
+  if (loading) {
+return Loading...;
+  }
+
+  if (error) {
+return Error: {error};
+  }
+
+  return (
+
+  Task Search
+   setSearchQuery(e.target.value)}
+  />
+  
+{tasks.map(task => (
+  
+{task.description}
+  
+))}
+  
+
+  );
+};
+
+export default TaskSearch;
\ No newline at end of file

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add user search (PR #107211)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 created 
https://github.com/llvm/llvm-project/pull/107211

None

>From e99c4dca4bfb7bed5c3069e056fb566b9c655eaa Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 4 Sep 2024 11:07:55 +0100
Subject: [PATCH] Add user search

---
 graphite-demo/frontend.jsx | 23 +--
 graphite-demo/server.js| 29 +
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx
index dd6a2a3ba66cc5..10512ee5f98f86 100644
--- a/graphite-demo/frontend.jsx
+++ b/graphite-demo/frontend.jsx
@@ -1,7 +1,8 @@
 import React, { useEffect, useState } from 'react';
 
-const TaskSearch = () => {
+const TaskAndUserSearch = () => {
   const [tasks, setTasks] = useState([]);
+  const [users, setUsers] = useState([]);
   const [loading, setLoading] = useState(true);
   const [error, setError] = useState(null);
   const [searchQuery, setSearchQuery] = useState('');
@@ -16,14 +17,15 @@ const TaskSearch = () => {
 return response.json();
   })
   .then(data => {
-setTasks(data);
+setTasks(data.tasks);
+setUsers(data.users);
 setLoading(false);
   })
   .catch(error => {
 setError(error.message);
 setLoading(false);
   });
-  }, [searchQuery]); // Depend on searchQuery
+  }, [searchQuery]);
 
   if (loading) {
 return Loading...;
@@ -35,13 +37,14 @@ const TaskSearch = () => {
 
   return (
 
-  Task Search
+  Search Tasks and Users
setSearchQuery(e.target.value)}
   />
+  Tasks
   
 {tasks.map(task => (
   
@@ -49,8 +52,16 @@ const TaskSearch = () => {
   
 ))}
   
+  Users
+  
+{users.map(user => (
+  
+{user.name}
+  
+))}
+  
 
   );
 };
 
-export default TaskSearch;
\ No newline at end of file
+export default TaskAndUserSearch;
\ No newline at end of file
diff --git a/graphite-demo/server.js b/graphite-demo/server.js
index cf7ec6507287f8..ff79b7d4915f8d 100644
--- a/graphite-demo/server.js
+++ b/graphite-demo/server.js
@@ -18,17 +18,38 @@ const tasks = [
   }
 ];
 
+// Fake data for users
+const users = [
+  {
+id: 101,
+name: 'Alice Smith'
+  },
+  {
+id: 102,
+name: 'Bob Johnson'
+  },
+  {
+id: 103,
+name: 'Charlie Brown'
+  }
+];
+
 app.get('/search', (req, res) => {
   // Retrieve the query parameter
   const query = req.query.query?.toLowerCase() || '';
 
   // Filter tasks based on the query
-  const filteredTasks = tasks.filter(task => 
task.description.toLowerCase().includes(query));
+  const filteredTasks = tasks.filter(task =>
+task.description.toLowerCase().includes(query)
+  ).sort((a, b) => a.description.localeCompare(b.description));
 
-  // Sort the filtered tasks alphabetically by description
-  const sortedTasks = filteredTasks.sort((a, b) => 
a.description.localeCompare(b.description));
+  // Filter users based on the query
+  const filteredUsers = users.filter(user =>
+user.name.toLowerCase().includes(query)
+  ).sort((a, b) => a.name.localeCompare(b.name));
 
-  res.json(sortedTasks);
+  // Return both sets of results
+  res.json({ tasks: filteredTasks, users: filteredUsers });
 });
 
 app.listen(port, () => {

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add user search (PR #107211)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/107211?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#107211** https://app.graphite.dev/github/pr/llvm/llvm-project/107211?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#107210** https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#107209** https://app.graphite.dev/github/pr/llvm/llvm-project/107209?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @SamTebbs33 and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/107211
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add user search (PR #107211)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 closed 
https://github.com/llvm/llvm-project/pull/107211
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)

2024-09-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 closed 
https://github.com/llvm/llvm-project/pull/107210
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 60fda8e - [ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch

2021-01-13 Thread Sam Tebbs via llvm-branch-commits


Author: Sam Tebbs
Date: 2021-01-13T17:23:00Z
New Revision: 60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22

URL: 
https://github.com/llvm/llvm-project/commit/60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22
DIFF: 
https://github.com/llvm/llvm-project/commit/60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22.diff

LOG: [ARM] Add a pass that re-arranges blocks when there is a backwards WLS 
branch

Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is 
forbidden by the architecture and so it fails to be converted into a 
low-overhead loop. This new pass checks for these cases and moves the target 
block, fixing any fall-through that would then be broken.

Differential Revision: https://reviews.llvm.org/D92385

Added: 
llvm/lib/Target/ARM/ARMBlockPlacement.cpp
llvm/test/CodeGen/Thumb2/block-placement.mir

Modified: 
llvm/lib/Target/ARM/ARM.h
llvm/lib/Target/ARM/ARMTargetMachine.cpp
llvm/lib/Target/ARM/CMakeLists.txt
llvm/test/CodeGen/ARM/O3-pipeline.ll

Removed: 




diff  --git a/llvm/lib/Target/ARM/ARM.h b/llvm/lib/Target/ARM/ARM.h
index d8a4e4c31012..f4fdc9803728 100644
--- a/llvm/lib/Target/ARM/ARM.h
+++ b/llvm/lib/Target/ARM/ARM.h
@@ -37,6 +37,7 @@ class PassRegistry;
 
 Pass *createMVETailPredicationPass();
 FunctionPass *createARMLowOverheadLoopsPass();
+FunctionPass *createARMBlockPlacementPass();
 Pass *createARMParallelDSPPass();
 FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM,
CodeGenOpt::Level OptLevel);
@@ -71,6 +72,7 @@ void initializeThumb2ITBlockPass(PassRegistry &);
 void initializeMVEVPTBlockPass(PassRegistry &);
 void initializeMVEVPTOptimisationsPass(PassRegistry &);
 void initializeARMLowOverheadLoopsPass(PassRegistry &);
+void initializeARMBlockPlacementPass(PassRegistry &);
 void initializeMVETailPredicationPass(PassRegistry &);
 void initializeMVEGatherScatterLoweringPass(PassRegistry &);
 void initializeARMSLSHardeningPass(PassRegistry &);

diff  --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp 
b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
new file mode 100644
index ..fda05f526335
--- /dev/null
+++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
@@ -0,0 +1,227 @@
+//===-- ARMBlockPlacement.cpp - ARM block placement pass ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass re-arranges machine basic blocks to suit target requirements.
+// Currently it only moves blocks to fix backwards WLS branches.
+//
+//===--===//
+
+#include "ARM.h"
+#include "ARMBaseInstrInfo.h"
+#include "ARMBasicBlockInfo.h"
+#include "ARMSubtarget.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineLoopInfo.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "arm-block-placement"
+#define DEBUG_PREFIX "ARM Block Placement: "
+
+namespace llvm {
+class ARMBlockPlacement : public MachineFunctionPass {
+private:
+  const ARMBaseInstrInfo *TII;
+  std::unique_ptr BBUtils = nullptr;
+  MachineLoopInfo *MLI = nullptr;
+
+public:
+  static char ID;
+  ARMBlockPlacement() : MachineFunctionPass(ID) {}
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  void moveBasicBlock(MachineBasicBlock *BB, MachineBasicBlock *After);
+  bool blockIsBefore(MachineBasicBlock *BB, MachineBasicBlock *Other);
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+AU.setPreservesCFG();
+AU.addRequired();
+MachineFunctionPass::getAnalysisUsage(AU);
+  }
+};
+
+} // namespace llvm
+
+FunctionPass *llvm::createARMBlockPlacementPass() {
+  return new ARMBlockPlacement();
+}
+
+char ARMBlockPlacement::ID = 0;
+
+INITIALIZE_PASS(ARMBlockPlacement, DEBUG_TYPE, "ARM block placement", false,
+false)
+
+bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
+  const ARMSubtarget &ST = static_cast(MF.getSubtarget());
+  if (!ST.hasLOB())
+return false;
+  LLVM_DEBUG(dbgs() << DEBUG_PREFIX << "Running on " << MF.getName() << "\n");
+  MLI = &getAnalysis();
+  TII = static_cast(ST.getInstrInfo());
+  BBUtils = std::unique_ptr(new ARMBasicBlockUtils(MF));
+  MF.RenumberBlocks();
+  BBUtils->computeAllBlockSizes();
+  BBUtils->adjustBBOffsetsAfter(&MF.front());
+  bool Changed = false;
+
+  // Find loops with a backwards branching WLS.
+  // This requires looping over the loops in the function, checking each
+  // preheader for a WLS and if its target is before the preheader. If moving
+  // the target block wouldn't produce another backwards WLS or a new forwards
+  // LE branch then move the target block after the preh

[llvm-branch-commits] [llvm] 5e4480b - [ARM] Don't run the block placement pass at O0

2021-01-15 Thread Sam Tebbs via llvm-branch-commits


Author: Sam Tebbs
Date: 2021-01-15T13:59:29Z
New Revision: 5e4480b6c0f02beef5ca7f62c3427031872fcd52

URL: 
https://github.com/llvm/llvm-project/commit/5e4480b6c0f02beef5ca7f62c3427031872fcd52
DIFF: 
https://github.com/llvm/llvm-project/commit/5e4480b6c0f02beef5ca7f62c3427031872fcd52.diff

LOG: [ARM] Don't run the block placement pass at O0

The block placement pass shouldn't run unless optimisations are enabled.

Differential Revision: https://reviews.llvm.org/D94691

Added: 


Modified: 
llvm/lib/Target/ARM/ARMBlockPlacement.cpp
llvm/lib/Target/ARM/ARMTargetMachine.cpp

Removed: 




diff  --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp 
b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
index fda05f526335..20491273ea5d 100644
--- a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
+++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
@@ -58,6 +58,8 @@ INITIALIZE_PASS(ARMBlockPlacement, DEBUG_TYPE, "ARM block 
placement", false,
 false)
 
 bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(MF.getFunction()))
+  return false;
   const ARMSubtarget &ST = static_cast(MF.getSubtarget());
   if (!ST.hasLOB())
 return false;

diff  --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp 
b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
index 51399941629a..237ef54c8339 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
@@ -553,11 +553,11 @@ void ARMPassConfig::addPreEmitPass() {
 return MF.getSubtarget().isThumb2();
   }));
 
-  addPass(createARMBlockPlacementPass());
-
-  // Don't optimize barriers at -O0.
-  if (getOptLevel() != CodeGenOpt::None)
+  // Don't optimize barriers or block placement at -O0.
+  if (getOptLevel() != CodeGenOpt::None) {
+addPass(createARMBlockPlacementPass());
 addPass(createARMOptimizeBarriersPass());
+  }
 }
 
 void ARMPassConfig::addPreEmitPass2() {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 1a497ae - [ARM][Block placement] Check the predecessor exists before processing it

2021-01-15 Thread Sam Tebbs via llvm-branch-commits


Author: Sam Tebbs
Date: 2021-01-15T15:45:13Z
New Revision: 1a497ae9b83653682d6d20f1ec131394e523375d

URL: 
https://github.com/llvm/llvm-project/commit/1a497ae9b83653682d6d20f1ec131394e523375d
DIFF: 
https://github.com/llvm/llvm-project/commit/1a497ae9b83653682d6d20f1ec131394e523375d.diff

LOG: [ARM][Block placement] Check the predecessor exists before processing it

Not all machine loops will have a predecessor. so the pass needs to
check it before continuing.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D94780

Added: 


Modified: 
llvm/lib/Target/ARM/ARMBlockPlacement.cpp
llvm/test/CodeGen/Thumb2/block-placement.mir

Removed: 




diff  --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp 
b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
index 20491273ea5d4..581b4b9857af3 100644
--- a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
+++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp
@@ -79,6 +79,8 @@ bool ARMBlockPlacement::runOnMachineFunction(MachineFunction 
&MF) {
   // LE branch then move the target block after the preheader.
   for (auto *ML : *MLI) {
 MachineBasicBlock *Preheader = ML->getLoopPredecessor();
+if (!Preheader)
+  continue;
 
 for (auto &Terminator : Preheader->terminators()) {
   if (Terminator.getOpcode() != ARM::t2WhileLoopStart)

diff  --git a/llvm/test/CodeGen/Thumb2/block-placement.mir 
b/llvm/test/CodeGen/Thumb2/block-placement.mir
index d96a1fb49abbb..ed4a0a6b493d8 100644
--- a/llvm/test/CodeGen/Thumb2/block-placement.mir
+++ b/llvm/test/CodeGen/Thumb2/block-placement.mir
@@ -25,6 +25,16 @@
   entry:
 unreachable
   }
+
+  define void @no_preheader(i32 %N, i32 %M, i32* nocapture %a, i32* nocapture 
%b, i32* nocapture %c) local_unnamed_addr #0 {
+  entry:
+unreachable
+  }
+
+  declare dso_local i32 @g(...) local_unnamed_addr #1
+
+  declare dso_local i32 @h(...) local_unnamed_addr #1
+
 ...
 ---
 name:backwards_branch
@@ -343,3 +353,91 @@ body: |
 t2B %bb.1, 14 /* CC::al */, $noreg
 
 ...
+---
+name:no_preheader
+body: |
+  ; CHECK-LABEL: name: no_preheader
+  ; CHECK: bb.0:
+  ; CHECK:   successors: %bb.2(0x3000), %bb.1(0x5000)
+  ; CHECK:   frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed 
$r5, $r7, killed $lr, implicit-def $sp, implicit $sp
+  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa_offset 16
+  ; CHECK:   frame-setup CFI_INSTRUCTION offset $lr, -4
+  ; CHECK:   frame-setup CFI_INSTRUCTION offset $r7, -8
+  ; CHECK:   frame-setup CFI_INSTRUCTION offset $r5, -12
+  ; CHECK:   frame-setup CFI_INSTRUCTION offset $r4, -16
+  ; CHECK:   $r7 = frame-setup tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
+  ; CHECK:   frame-setup CFI_INSTRUCTION def_cfa $r7, 8
+  ; CHECK:   $r4 = tMOVr killed $r0, 14 /* CC::al */, $noreg
+  ; CHECK:   tBL 14 /* CC::al */, $noreg, @g, csr_aapcs, implicit-def dead 
$lr, implicit $sp, implicit-def $sp, implicit-def $r0
+  ; CHECK:   tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, 
implicit-def $cpsr
+  ; CHECK:   t2Bcc %bb.2, 0 /* CC::eq */, killed $cpsr
+  ; CHECK: bb.1:
+  ; CHECK:   successors: %bb.4(0x8000)
+  ; CHECK:   renamable $r0, dead $cpsr = tMOVi8 4, 14 /* CC::al */, $noreg
+  ; CHECK:   renamable $r5 = t2LDRSHi12 killed renamable $r0, 0, 14 /* CC::al 
*/, $noreg
+  ; CHECK:   t2B %bb.4, 14 /* CC::al */, $noreg
+  ; CHECK: bb.2:
+  ; CHECK:   successors: %bb.4(0x8000)
+  ; CHECK:   renamable $r5, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg
+  ; CHECK:   t2B %bb.4, 14 /* CC::al */, $noreg
+  ; CHECK: bb.3:
+  ; CHECK:   successors: %bb.4(0x8000)
+  ; CHECK:   $r0 = tMOVr $r5, 14 /* CC::al */, $noreg
+  ; CHECK:   tBL 14 /* CC::al */, $noreg, @h, csr_aapcs, implicit-def dead 
$lr, implicit $sp, implicit killed $r0, implicit-def $sp, implicit-def dead $r0
+  ; CHECK: bb.4:
+  ; CHECK:   successors: %bb.5(0x0400), %bb.3(0x7c00)
+  ; CHECK:   renamable $r0 = tLDRi renamable $r4, 0, 14 /* CC::al */, $noreg
+  ; CHECK:   tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, 
implicit-def $cpsr
+  ; CHECK:   t2Bcc %bb.3, 1 /* CC::ne */, killed $cpsr
+  ; CHECK: bb.5:
+  ; CHECK:   frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $r5, 
def $r7, def $pc
+  bb.0:
+successors: %bb.1(0x3000), %bb.2(0x5000)
+liveins: $r0, $r4, $r5, $lr
+
+frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, $r7, 
killed $lr, implicit-def $sp, implicit $sp
+frame-setup CFI_INSTRUCTION def_cfa_offset 16
+frame-setup CFI_INSTRUCTION offset $lr, -4
+frame-setup CFI_INSTRUCTION offset $r7, -8
+frame-setup CFI_INSTRUCTION offset $r5, -12
+frame-setup CFI_INSTRUCTION offset $r4, -16
+$r7 = frame-setup tADDrSPi $sp, 2, 14 /* CC::al */, $noreg
+frame-setup CFI_INSTRUCTION def_cfa $r7, 8
+$r4 = tMOVr killed $r0, 14 /* CC::al */, $noreg
+tBL 14 /* CC::al */, $nore

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2024-11-25 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

I've rebased this on top of my PR that adds an intrinsic since that's less 
fragile to match in the backend. So this should now be ready to have a look at.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disallow vscale x 1 partial reductions (PR #125252)

2025-02-05 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 edited 
https://github.com/llvm/llvm-project/pull/125252
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disallow vscale x 1 partial reductions (PR #125252)

2025-02-05 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/125252

error: too big or took too long to generate
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [AArch64] Fix op mask detection in performZExtDeinterleaveShuffleCombine (#126054) (PR #126263)

2025-02-10 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 approved this pull request.

It makes sense to merge this as it fixes a micompilation.

https://github.com/llvm/llvm-project/pull/126263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [AArch64] Fix op mask detection in performZExtDeinterleaveShuffleCombine (#126054) (PR #126263)

2025-02-10 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 edited 
https://github.com/llvm/llvm-project/pull/126263
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-05 Thread Sam Tebbs via llvm-branch-commits



@@ -5026,10 +5026,24 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-05 Thread Sam Tebbs via llvm-branch-commits



@@ -3177,6 +3177,420 @@ for.exit:; preds = %for.body
   ret i32 %add
 }
 
+define dso_local void @dotp_high_register_pressure(ptr %a, ptr %b, ptr %sum, 
i32 %n) #1 {

SamTebbs33 wrote:

Added.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 created 
https://github.com/llvm/llvm-project/pull/133090

This PR accounts for scaled reductions in `calculateRegisterUsage` to reflect 
the fact that the number of lanes in their output is smaller than the VF.

>From 6193c2c846710472c7e604ef33a15cda18771328 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 26 Mar 2025 14:01:59 +
Subject: [PATCH] [LV] Reduce register usage for scaled reductions

---
 .../Transforms/Vectorize/LoopVectorize.cpp|  24 +-
 .../Transforms/Vectorize/VPRecipeBuilder.h|   3 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  14 +-
 .../partial-reduce-dot-product-neon.ll|  60 ++-
 .../AArch64/partial-reduce-dot-product.ll | 414 ++
 5 files changed, 495 insertions(+), 20 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c9f314c0ba481..da701ef9ff1a2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])
+  LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J]
+<< " to " << VF << " for ";
+ R->dump(););
+
 for (VPValue *DefV : R->definedValues()) {
   Type *ScalarTy = TypeInfo.inferScalarType(DefV);
   unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy);
-  RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]);
+  RegUsage[ClassID] += GetRegUsage(ScalarTy, VF);
 }
   }
 }
@@ -8963,8 +8976,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   if (isa(Instr) || isa(Instr))
 return tryToWidenMemory(Instr, Operands, Range);
 
-  if (getScalingForReduction(Instr))
-return tryToCreatePartialReduction(Instr, Operands);
+  if (auto ScaleFactor = getScalingForReduction(Instr))
+return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
 
   if (!shouldWiden(Instr, Range))
 return nullptr;
@@ -8988,7 +9001,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
 
 VPRecipeBase *
 VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
- ArrayRef Operands) {
+ ArrayRef Operands,
+ unsigned ScaleFactor) {
   assert(Operands.size() == 2 &&
  "Unexpected number of operands for partial reduction");
 
@@ -9021,7 +9035,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction 
*Reduction,
 BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
   }
   return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator,
-  Reduction);
+  ScaleFactor, Reduction);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h 
b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..fd0064a34c4c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -178,7 +178,8 @@ class VPRecipeBuilder {
   /// Create and return a partial reduction recipe for a reduction instruction
   /// along with binary operation and reduction phi operands.
   VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction,
-ArrayRef Operands);
+ArrayRef Operands,
+unsigned ScaleFactor);
 
   /// Set the recipe created for given ingredient.
   void setRecipe(Instruction *I, VPRecipeBase *R) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 80b3d2a760293..d84efb1bd6850 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2001,6 +2001,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Generate the phi/select nodes.
   void execute(VPTransformState &State) override;
 
+  unsigned getVFScaleFactor() const { return VFScaleFactor; }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
   void print(ra

[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)

2025-04-08 Thread Sam Tebbs via llvm-branch-commits



@@ -7772,12 +7551,23 @@ VectorizationFactor 
LoopVectorizationPlanner::computeBestVF() {
 
   InstructionCost Cost = cost(*P, VF);
   VectorizationFactor CurrentFactor(VF, Cost, ScalarCost);
-  if (isMoreProfitable(CurrentFactor, BestFactor))
-BestFactor = CurrentFactor;
-
   // If profitable add it to ProfitableVF list.
   if (isMoreProfitable(CurrentFactor, ScalarFactor))
 ProfitableVFs.push_back(CurrentFactor);

SamTebbs33 wrote:

Thanks for spotting that, done.

https://github.com/llvm/llvm-project/pull/132190
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)

2025-04-08 Thread Sam Tebbs via llvm-branch-commits



@@ -7759,7 +7535,10 @@ VectorizationFactor 
LoopVectorizationPlanner::computeBestVF() {
   }
 
   for (auto &P : VPlans) {
-for (ElementCount VF : P->vectorFactors()) {
+SmallVector VFs(P->vectorFactors());
+auto RUs = ::calculateRegisterUsage(*P, VFs, TTI);
+for (unsigned I = 0; I < VFs.size(); I++) {
+  auto VF = VFs[I];

SamTebbs33 wrote:

Thanks for the suggestion, done.

https://github.com/llvm/llvm-project/pull/132190
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [libcxx] [lldb] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-09 Thread Sam Tebbs via llvm-branch-commits

Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,zcfh <1412805...@qq.com>,Alexey
 Bataev ,Florian Hahn ,Alexey Bataev
 ,Hristo Hristov ,Mircea
 Trofin ,Florian Hahn ,Jonas Devlieghere
 ,Henry Jiang ,Alexander
 Yermolovich <43973793+ayerm...@users.noreply.github.com>,Andy Kaylor
 ,Andy Kaylor ,Florian Hahn
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Andre Kuhlenschmidt
 ,Jan Svoboda ,Jorge
 Gorbe Moya ,Sumit Agarwal ,Andre
 Kuhlenschmidt ,Louis Dionne
 ,Jason Molenda ,modiking
 ,Ian Anderson ,Aditya Tejpaul
 <97700214+hoarfros...@users.noreply.github.com>,Alex MacLean
 ,Jorge Gorbe Moya ,Michael Jones
 ,Slava Zakharin ,Jerry-Ge
 ,Jerry-Ge ,Sudharsan Veeravalli
 ,Rodrigo Rocha ,Mircea Trofin
 ,Mircea Trofin ,NAKAMURA Takumi
 ,Fangrui Song ,Phoebe Wang
 ,Jacob Lalonde ,Kareem Ergawy
 ,cmtice ,Fangrui Song
 ,Iris <0...@owo.li>,Mats Jun Larsen ,Mats Jun
 Larsen ,Mariya Podchishchaeva
 ,Pavel Labath ,Vladi Krapp
 ,David Sherwood ,Igor Wodiany
 ,Adrian Kuegel ,Tobias Stadler
 ,Florian Hahn ,gbMattN
 ,Alaa Ali ,Durgadoss R
 ,Jerry-Ge ,Florian Hahn
 ,Vladi Krapp ,Sergio Afonso
 ,Paul Walker ,JaydeepChauhan14
 ,Vy Nguyen ,Zahira
 Ammarguellat ,Baranov Victor
 ,Ilya Biryukov ,Ilya
 Biryukov ,Mariya Podchishchaeva
 ,Nashe Mncube ,Asher
 Mancinelli ,Matthias Springer ,Justin
 Bogner ,Aaron Ballman ,Ramkumar
 Ramachandra ,Nikita Popov
 ,Nashe Mncube ,Nikita Popov
 ,David Spickett ,Florian Hahn
 ,=?utf-8?q?Gaëtan?= Bossu ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Louis Dionne ,Craig
 Topper ,Evan Wilde ,Kevin Gleason
 ,Paschalis Mpeis ,Matthias
 Springer ,Snehasish Kumar ,Craig Topper
 ,Peter Klausler ,Felipe de
 Azevedo Piovezan ,Alexey Bataev ,Jan
 Svoboda ,Jan Svoboda ,Jan
 Svoboda ,Nico Weber ,Nico Weber
 ,Nico Weber ,Aaron Ballman
 ,Florian Mayer ,Luke Lau
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Alexey
 Bataev ,Ryosuke Niwa ,LLVM GN
 Syncbot ,LLVM GN Syncbot 
,LLVM
 GN Syncbot ,Maksim Panchenko ,Sirraide
 ,Louis Dionne ,Aaron Ballman
 ,Ryosuke Niwa ,Lei Huang
 ,Zahira Ammarguellat 
,erichkeane
 ,Craig Topper ,Jonas Devlieghere
 ,Michael Jones ,Lei Huang
 ,Florian Hahn ,Eugene Epshteyn
 ,Craig Topper ,Jonas
 Devlieghere ,Craig Topper ,Finn
 Plummer ,
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Craig Topper
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Jorge
 Gorbe Moya ,Alex MacLean ,Jerry-Ge
 ,Ashley Coleman ,Austin Schuh
 ,Andy Kaylor ,Un1q32
 ,Jonas Devlieghere ,alx32
 <103613512+al...@users.noreply.github.com>,Slava Zakharin
 ,Michael Jones ,Andy Kaylor
 ,Felipe de Azevedo Piovezan ,weiwei
 chen ,Sudharsan Veeravalli 
,Fangrui
 Song ,jobhdez ,Fangrui Song 
,Fangrui
 Song ,Changpeng Fang ,Reid Kleckner
 ,Fangrui Song ,Aiden Grossman
 ,Aiden Grossman ,Vlad
 Serebrennikov ,Mats Jun Larsen
 ,David CARLIER ,David CARLIER
 ,Andreas Jonson ,Andreas Jonson
 ,Phoebe Wang ,Florian Hahn
 ,James E T Smith 
,Matthias
 Springer ,Martin =?utf-8?q?Storsjö?= ,Antonio
 Frighetto ,Florian Hahn ,Rahul Joshi
 ,Fangrui Song ,Fangrui Song 
,Louis
 Dionne ,Fangrui Song ,Fangrui Song
 ,Aiden Grossman ,Fangrui Song
 ,Florian Hahn ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Owen Pan
 ,Fangrui Song ,Florian Hahn
 ,Fangrui Song ,Fangrui Song 
,junfengd-nv
 ,Fangrui Song ,Fangrui Song 

 =?utf-8?q?,?�ngrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Alan 
,weiwei
 chen ,Owen Pan ,Fangrui Song
 ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Florian Hahn 
,Hui
 ,Sergei Barannikov ,Benjamin
 Kramer ,Florian Hahn ,Florian Hahn
 ,Benjamin Kramer ,Florian Hahn
 ,Florian Hahn ,Sergei Barannikov
 ,Fangrui Song ,Fangrui Song
 ,Florian Hahn ,Fangrui Song 
,Florian
 Hahn ,Mats Jun Larsen ,Phoebe Wang
 ,Matheus Izvekov ,Alan
 ,Baranov Victor ,Zhen Wang
 <37195552+wangz...@users.noreply.github.com>,Matheus Izvekov
 ,Matt Arsenault ,Matt
 Arsenault ,Tobias Gysi
 ,Fangrui Song ,Nikita Popov
 ,
=?utf-8?q?Balázs_Kéri?= ,Matt Arsenault
 ,Matt Arsenault ,Robert
 Imschweiler ,Matt Arsenault
 ,Matt Arsenault ,
Juan Manuel Martinez =?utf-8?q?Caamaño?=,Mike ,Luke
 Lau ,Simon Pilgrim ,Simon Pilgrim
 ,Simon Pilgrim ,Matt
 Arsenault ,Matthias Springer 
,Han-Kuan
 Chen ,Abhishek Kaushik 
,Anatoly
 Trosinenko ,
Andrzej =?utf-8?q?Warzyński?= ,Jack Frankland
 ,Anatoly Trosinenko ,Mel
 Chen ,Tom Eccles ,Aaron Ballman
 ,Jorn Tuyls ,Simon
 Pilgrim ,Jay Foad ,Zhaoxin Yang
 ,Uday Bondhugula ,Mats Jun
 Larsen ,Christian Sigg ,Jay Foad
 ,JaydeepChauhan14 
,Matthias
 Springer ,
Andrzej =?utf-8?q?Warzyński?= ,Krisztian
 Rugasi ,Nashe Mncube 
,Farzon
 Lotfi ,Pedro Lobo 
,Asher
 Mancinelli ,Farzon Lotfi
 ,Igor Wodiany ,Farzon
 Lotfi ,zhijian lin ,Matt
 Arsenault ,Justin Bogner 
,Michael
 Klemm ,Matheus Izvekov ,Joseph
 Huber ,Julian Lettner 
,Alexandre
 Ganea ,David Spickett ,Paul
 Kirth ,Rahul Joshi ,Farzon Lotfi
 ,Simon Pilgrim ,Linux
 User ,Ma

[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [libcxx] [lldb] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-09 Thread Sam Tebbs via llvm-branch-commits

Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,zcfh <1412805...@qq.com>,Alexey
 Bataev ,Florian Hahn ,Alexey Bataev
 ,Hristo Hristov ,Mircea
 Trofin ,Florian Hahn ,Jonas Devlieghere
 ,Henry Jiang ,Alexander
 Yermolovich <43973793+ayerm...@users.noreply.github.com>,Andy Kaylor
 ,Andy Kaylor ,Florian Hahn
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Andre Kuhlenschmidt
 ,Jan Svoboda ,Jorge
 Gorbe Moya ,Sumit Agarwal ,Andre
 Kuhlenschmidt ,Louis Dionne
 ,Jason Molenda ,modiking
 ,Ian Anderson ,Aditya Tejpaul
 <97700214+hoarfros...@users.noreply.github.com>,Alex MacLean
 ,Jorge Gorbe Moya ,Michael Jones
 ,Slava Zakharin ,Jerry-Ge
 ,Jerry-Ge ,Sudharsan Veeravalli
 ,Rodrigo Rocha ,Mircea Trofin
 ,Mircea Trofin ,NAKAMURA Takumi
 ,Fangrui Song ,Phoebe Wang
 ,Jacob Lalonde ,Kareem Ergawy
 ,cmtice ,Fangrui Song
 ,Iris <0...@owo.li>,Mats Jun Larsen ,Mats Jun
 Larsen ,Mariya Podchishchaeva
 ,Pavel Labath ,Vladi Krapp
 ,David Sherwood ,Igor Wodiany
 ,Adrian Kuegel ,Tobias Stadler
 ,Florian Hahn ,gbMattN
 ,Alaa Ali ,Durgadoss R
 ,Jerry-Ge ,Florian Hahn
 ,Vladi Krapp ,Sergio Afonso
 ,Paul Walker ,JaydeepChauhan14
 ,Vy Nguyen ,Zahira
 Ammarguellat ,Baranov Victor
 ,Ilya Biryukov ,Ilya
 Biryukov ,Mariya Podchishchaeva
 ,Nashe Mncube ,Asher
 Mancinelli ,Matthias Springer ,Justin
 Bogner ,Aaron Ballman ,Ramkumar
 Ramachandra ,Nikita Popov
 ,Nashe Mncube ,Nikita Popov
 ,David Spickett ,Florian Hahn
 ,=?utf-8?q?Gaëtan?= Bossu ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Peter Klausler ,Peter
 Klausler ,Louis Dionne ,Craig
 Topper ,Evan Wilde ,Kevin Gleason
 ,Paschalis Mpeis ,Matthias
 Springer ,Snehasish Kumar ,Craig Topper
 ,Peter Klausler ,Felipe de
 Azevedo Piovezan ,Alexey Bataev ,Jan
 Svoboda ,Jan Svoboda ,Jan
 Svoboda ,Nico Weber ,Nico Weber
 ,Nico Weber ,Aaron Ballman
 ,Florian Mayer ,Luke Lau
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Alexey
 Bataev ,Ryosuke Niwa ,LLVM GN
 Syncbot ,LLVM GN Syncbot 
,LLVM
 GN Syncbot ,Maksim Panchenko ,Sirraide
 ,Louis Dionne ,Aaron Ballman
 ,Ryosuke Niwa ,Lei Huang
 ,Zahira Ammarguellat 
,erichkeane
 ,Craig Topper ,Jonas Devlieghere
 ,Michael Jones ,Lei Huang
 ,Florian Hahn ,Eugene Epshteyn
 ,Craig Topper ,Jonas
 Devlieghere ,Craig Topper ,Finn
 Plummer ,
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Craig Topper
 ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Jorge
 Gorbe Moya ,Alex MacLean ,Jerry-Ge
 ,Ashley Coleman ,Austin Schuh
 ,Andy Kaylor ,Un1q32
 ,Jonas Devlieghere ,alx32
 <103613512+al...@users.noreply.github.com>,Slava Zakharin
 ,Michael Jones ,Andy Kaylor
 ,Felipe de Azevedo Piovezan ,weiwei
 chen ,Sudharsan Veeravalli 
,Fangrui
 Song ,jobhdez ,Fangrui Song 
,Fangrui
 Song ,Changpeng Fang ,Reid Kleckner
 ,Fangrui Song ,Aiden Grossman
 ,Aiden Grossman ,Vlad
 Serebrennikov ,Mats Jun Larsen
 ,David CARLIER ,David CARLIER
 ,Andreas Jonson ,Andreas Jonson
 ,Phoebe Wang ,Florian Hahn
 ,James E T Smith 
,Matthias
 Springer ,Martin =?utf-8?q?Storsjö?= ,Antonio
 Frighetto ,Florian Hahn ,Rahul Joshi
 ,Fangrui Song ,Fangrui Song 
,Louis
 Dionne ,Fangrui Song ,Fangrui Song
 ,Aiden Grossman ,Fangrui Song
 ,Florian Hahn ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Owen Pan
 ,Fangrui Song ,Florian Hahn
 ,Fangrui Song ,Fangrui Song 
,junfengd-nv
 ,Fangrui Song ,Fangrui Song 

 =?utf-8?q?,?�ngrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Alan 
,weiwei
 chen ,Owen Pan ,Fangrui Song
 ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Fangrui Song 
,Fangrui
 Song ,Fangrui Song ,Florian Hahn 
,Hui
 ,Sergei Barannikov ,Benjamin
 Kramer ,Florian Hahn ,Florian Hahn
 ,Benjamin Kramer ,Florian Hahn
 ,Florian Hahn ,Sergei Barannikov
 ,Fangrui Song ,Fangrui Song
 ,Florian Hahn ,Fangrui Song 
,Florian
 Hahn ,Mats Jun Larsen ,Phoebe Wang
 ,Matheus Izvekov ,Alan
 ,Baranov Victor ,Zhen Wang
 <37195552+wangz...@users.noreply.github.com>,Matheus Izvekov
 ,Matt Arsenault ,Matt
 Arsenault ,Tobias Gysi
 ,Fangrui Song ,Nikita Popov
 ,
=?utf-8?q?Balázs_Kéri?= ,Matt Arsenault
 ,Matt Arsenault ,Robert
 Imschweiler ,Matt Arsenault
 ,Matt Arsenault ,
Juan Manuel Martinez =?utf-8?q?Caamaño?=,Mike ,Luke
 Lau ,Simon Pilgrim ,Simon Pilgrim
 ,Simon Pilgrim ,Matt
 Arsenault ,Matthias Springer 
,Han-Kuan
 Chen ,Abhishek Kaushik 
,Anatoly
 Trosinenko ,
Andrzej =?utf-8?q?Warzyński?= ,Jack Frankland
 ,Anatoly Trosinenko ,Mel
 Chen ,Tom Eccles ,Aaron Ballman
 ,Jorn Tuyls ,Simon
 Pilgrim ,Jay Foad ,Zhaoxin Yang
 ,Uday Bondhugula ,Mats Jun
 Larsen ,Christian Sigg ,Jay Foad
 ,JaydeepChauhan14 
,Matthias
 Springer ,
Andrzej =?utf-8?q?Warzyński?= ,Krisztian
 Rugasi ,Nashe Mncube 
,Farzon
 Lotfi ,Pedro Lobo 
,Asher
 Mancinelli ,Farzon Lotfi
 ,Igor Wodiany ,Farzon
 Lotfi ,zhijian lin ,Matt
 Arsenault ,Justin Bogner 
,Michael
 Klemm ,Matheus Izvekov ,Joseph
 Huber ,Julian Lettner 
,Alexandre
 Ganea ,David Spickett ,Paul
 Kirth ,Rahul Joshi ,Farzon Lotfi
 ,Simon Pilgrim ,Linux
 User ,Ma

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-07 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 edited 
https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)

2025-04-07 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 edited 
https://github.com/llvm/llvm-project/pull/132190
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-22 Thread Sam Tebbs via llvm-branch-commits



@@ -253,38 +253,38 @@ define i64 @not_dotp_i8_to_i64_has_neon_dotprod(ptr 
readonly %a, ptr readonly %b
 ; CHECK-MAXBW-SAME: ptr readonly [[A:%.*]], ptr readonly [[B:%.*]]) 
#[[ATTR1:[0-9]+]] {
 ; CHECK-MAXBW-NEXT:  entry:
 ; CHECK-MAXBW-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 8
+; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 16
 ; CHECK-MAXBW-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
 ; CHECK-MAXBW:   vector.ph:
 ; CHECK-MAXBW-NEXT:[[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 8
+; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 16
 ; CHECK-MAXBW-NEXT:[[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
 ; CHECK-MAXBW-NEXT:[[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
 ; CHECK-MAXBW-NEXT:[[TMP4:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 8
+; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 16
 ; CHECK-MAXBW-NEXT:[[TMP6:%.*]] = getelementptr i8, ptr [[A]], i64 
[[N_VEC]]
 ; CHECK-MAXBW-NEXT:[[TMP7:%.*]] = getelementptr i8, ptr [[B]], i64 
[[N_VEC]]
 ; CHECK-MAXBW-NEXT:br label [[VECTOR_BODY:%.*]]
 ; CHECK-MAXBW:   vector.body:
 ; CHECK-MAXBW-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi  [ 
zeroinitializer, [[VECTOR_PH]] ], [ [[TMP15:%.*]], [[VECTOR_BODY]] ]
+; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi  [ 
zeroinitializer, [[VECTOR_PH]] ], [ [[PARTIAL_REDUCE:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-MAXBW-NEXT:[[TMP8:%.*]] = add i64 [[INDEX]], 0
 ; CHECK-MAXBW-NEXT:[[NEXT_GEP:%.*]] = getelementptr i8, ptr [[A]], i64 
[[TMP8]]
 ; CHECK-MAXBW-NEXT:[[TMP9:%.*]] = add i64 [[INDEX]], 0
 ; CHECK-MAXBW-NEXT:[[NEXT_GEP1:%.*]] = getelementptr i8, ptr [[B]], i64 
[[TMP9]]
 ; CHECK-MAXBW-NEXT:[[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i32 0
-; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr 
[[TMP10]], align 1
-; CHECK-MAXBW-NEXT:[[TMP11:%.*]] = zext  [[WIDE_LOAD]] to 

+; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr 
[[TMP10]], align 1
 ; CHECK-MAXBW-NEXT:[[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], 
i32 0
-; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr 
[[TMP12]], align 1
-; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext  [[WIDE_LOAD2]] 
to 
-; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw  
[[TMP13]], [[TMP11]]
-; CHECK-MAXBW-NEXT:[[TMP15]] = add  [[TMP14]], 
[[VEC_PHI]]
+; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr 
[[TMP12]], align 1
+; CHECK-MAXBW-NEXT:[[TMP15:%.*]] = zext  [[WIDE_LOAD2]] 
to 
+; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext  [[WIDE_LOAD]] 
to 
+; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw  
[[TMP15]], [[TMP13]]
+; CHECK-MAXBW-NEXT:[[PARTIAL_REDUCE]] = call  
@llvm.experimental.vector.partial.reduce.add.nxv2i64.nxv16i64( [[VEC_PHI]],  [[TMP14]])

SamTebbs33 wrote:

Ah it looks like what was previously too high a cost for it to choose a 16i8 -> 
2i64 partial reduction isn't sufficiently high now that the extend cost is 
hidden. I've made this permutation invalid.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-22 Thread Sam Tebbs via llvm-branch-commits



@@ -2376,6 +2327,59 @@ class VPReductionRecipe : public VPRecipeWithIRFlags {
   }
 };
 
+/// A recipe for forming partial reductions. In the loop, an accumulator and
+/// vector operand are added together and passed to the next iteration as the
+/// next accumulator. After the loop body, the accumulator is reduced to a
+/// scalar value.
+class VPPartialReductionRecipe : public VPReductionRecipe {

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-28 Thread Sam Tebbs via llvm-branch-commits



@@ -219,6 +219,8 @@ class TargetTransformInfo {
   /// Get the kind of extension that an instruction represents.
   static PartialReductionExtendKind
   getPartialReductionExtendKind(Instruction *I);
+  static PartialReductionExtendKind
+  getPartialReductionExtendKind(Instruction::CastOps ExtOpcode);

SamTebbs33 wrote:

Using the `CastOps` one in the other is a good idea. Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-28 Thread Sam Tebbs via llvm-branch-commits



@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

SamTebbs33 wrote:

I don't think I could make it an NFC change, since to conform to 
`VPReductionRecipe`, the accumulator and binop have to be swapped around.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-23 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 created 
https://github.com/llvm/llvm-project/pull/136997

This PR adds support for extensions of different signedness to 
VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled 
into that class.

>From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 23 Apr 2025 13:16:38 +0100
Subject: [PATCH] [LoopVectorizer] Bundle partial reductions with different
 extensions

This PR adds support for extensions of different signedness to
VPMulAccumulateReductionRecipe and allows such partial reductions to be
bundled into that class.
---
 llvm/lib/Transforms/Vectorize/VPlan.h | 42 +-
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++---
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 25 -
 .../partial-reduce-dot-product-mixed.ll   | 56 +--
 .../LoopVectorize/AArch64/vplan-printing.ll   | 29 +-
 5 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 20d272e69e6e7..e11f608d068da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public 
VPReductionRecipe {
 /// recipe is abstract and needs to be lowered to concrete recipes before
 /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}.
 class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend recipe.
-  Instruction::CastOps ExtOp;
+  /// Opcodes of the extend recipes.
+  Instruction::CastOps ExtOp0;
+  Instruction::CastOps ExtOp1;
 
-  /// Non-neg flag of the extend recipe.
-  bool IsNonNeg = false;
+  /// Non-neg flags of the extend recipe.
+  bool IsNonNeg0 = false;
+  bool IsNonNeg1 = false;
 
   Type *ResultTy;
 
@@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 MulAcc->getCondOp(), MulAcc->isOrdered(),
 WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), 
MulAcc->hasNoSignedWrap()),
 MulAcc->getDebugLoc()),
-ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
+ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()),
+IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()),
 ResultTy(MulAcc->getResultType()),
 IsPartialReduction(MulAcc->isPartialReduction()) {}
 
@@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()),
+ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()),
+IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()),
 ResultTy(ResultTy),
 IsPartialReduction(isa(R)) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
@@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Instruction::CastOps::CastOpsEnd) {
+ExtOp0(Instruction::CastOps::CastOpsEnd),
+ExtOp1(Instruction::CastOps::CastOpsEnd) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
Instruction::Add &&
"The reduction instruction in MulAccumulateReductionRecipe must be "
@@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; 
}
 
   /// Return if the operands of mul instruction come from same extend.
-  bool isSameExtend() const { return getVecOp0() == getVecOp1(); }
+  bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); }
 
-  /// Return the opcode of the underlying extend.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
+  /// Return the opcode of the underlying extends.
+  Instruction::CastOps getExt0Opcode() const { return ExtOp0; }
+  Instruction::CastOps getExt1Opcode() const { return ExtOp1; }
+
+  /// Return if the first extend's opcode is ZExt.
+  bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; }
+
+  /// Return if the second extend's opcode is ZExt.
+  bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; }
 
-  /// Return if the extend opcode is ZExt.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
+  /// Return the non negative flag of the first ext recipe.
+  bool isNonNeg0() const { return IsNonNe

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits



@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());
+}
+
+TargetTransformInfo::PartialReductionExtendKind
+TargetTransformInfo::getPartialReductionExtendKind(
+Instruction::CastOps ExtOpcode) {
+  switch (ExtOpcode) {
+  case Instruction::CastOps::ZExt:
 return PR_ZeroExtend;
-  return PR_None;
+  case Instruction::CastOps::SExt:
+return PR_SignExtend;
+  default:
+return PR_None;

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits



@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost(
 
 TargetTransformInfo::PartialReductionExtendKind
 TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) {
-  if (isa(I))
-return PR_SignExtend;
-  if (isa(I))
+  auto *Cast = dyn_cast(I);
+  if (!Cast)
+return PR_None;
+  return getPartialReductionExtendKind(Cast->getOpcode());

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits



@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {

SamTebbs33 wrote:

At this point we've already created the partial reduction and clamped the range 
so I don't think we need to do any costing (like 
`tryToMatchAndCreateMulAccumulateReduction` does with `getMulAccReductionCost`) 
since we already know it's worthwhile (see `getScaledReductions` in 
LoopVectorize.cpp). This part of the code just puts the partial reduction 
inside the abstract recipe, which shouldn't need to consider any costing.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-30 Thread Sam Tebbs via llvm-branch-commits



@@ -2432,12 +2437,40 @@ static void 
tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red,
   Red->replaceAllUsesWith(AbstractR);
 }
 
+/// This function tries to create an abstract recipe from a partial reduction 
to
+/// hide its mul and extends from cost estimation.
+static void
+tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) {
+  if (PRed->getOpcode() != Instruction::Add)
+return;
+
+  VPRecipeBase *BinOpR = PRed->getBinOp()->getDefiningRecipe();
+  auto *BinOp = dyn_cast(BinOpR);
+  if (!BinOp || BinOp->getOpcode() != Instruction::Mul)
+return;

SamTebbs33 wrote:

Done :+1: .

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-05-01 Thread Sam Tebbs via llvm-branch-commits



@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   }
 };
 
-/// A recipe for forming partial reductions. In the loop, an accumulator and

SamTebbs33 wrote:

I've pre-committed the NFC but rebasing Elvis's changes on top of that has been 
pretty challenging considering the number of commits on that branch. So I will 
cherry-pick the NFC on to this branch and it'll just go away once Elvis's PR 
lands and I rebase this PR on top of main.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-05-01 Thread Sam Tebbs via llvm-branch-commits



@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
 return Invalid;
   break;
 case 16:
-  if (AccumEVT == MVT::i64)
-Cost *= 2;
-  else if (AccumEVT != MVT::i32)
+  if (AccumEVT != MVT::i32)

SamTebbs33 wrote:

Good spot. Done.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-28 Thread Sam Tebbs via llvm-branch-commits



@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public 
VPReductionRecipe {
 /// recipe is abstract and needs to be lowered to concrete recipes before
 /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}.
 class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend recipe.
-  Instruction::CastOps ExtOp;
+  /// Opcodes of the extend recipes.

SamTebbs33 wrote:

I like that, thanks. Added.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-28 Thread Sam Tebbs via llvm-branch-commits



@@ -2438,14 +2438,14 @@ 
VPMulAccumulateReductionRecipe::computeCost(ElementCount VF,
 return Ctx.TTI.getPartialReductionCost(
 Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()),
 Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF,
-TTI::getPartialReductionExtendKind(getExtOpcode()),
-TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul);
+TTI::getPartialReductionExtendKind(getExt0Opcode()),
+TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul);
   }
 
   Type *RedTy = Ctx.Types.inferScalarType(this);
   auto *SrcVecTy =
   cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF));
-  return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy,
+  return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy,

SamTebbs33 wrote:

I started off by modifying the TTI hook but found that it wasn't actually 
necessary since only partial reductions make use of the differing signedness 
and they don't use this hook. If someone is interested in getting 
mul-acc-reduce generated with different extensions then they can do the 
investigation needed for costing but I think it's outside the scope of this 
work.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-04-28 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/136997

>From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 23 Apr 2025 13:16:38 +0100
Subject: [PATCH 1/2] [LoopVectorizer] Bundle partial reductions with different
 extensions

This PR adds support for extensions of different signedness to
VPMulAccumulateReductionRecipe and allows such partial reductions to be
bundled into that class.
---
 llvm/lib/Transforms/Vectorize/VPlan.h | 42 +-
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++---
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 25 -
 .../partial-reduce-dot-product-mixed.ll   | 56 +--
 .../LoopVectorize/AArch64/vplan-printing.ll   | 29 +-
 5 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 20d272e69e6e7..e11f608d068da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public 
VPReductionRecipe {
 /// recipe is abstract and needs to be lowered to concrete recipes before
 /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}.
 class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend recipe.
-  Instruction::CastOps ExtOp;
+  /// Opcodes of the extend recipes.
+  Instruction::CastOps ExtOp0;
+  Instruction::CastOps ExtOp1;
 
-  /// Non-neg flag of the extend recipe.
-  bool IsNonNeg = false;
+  /// Non-neg flags of the extend recipe.
+  bool IsNonNeg0 = false;
+  bool IsNonNeg1 = false;
 
   Type *ResultTy;
 
@@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 MulAcc->getCondOp(), MulAcc->isOrdered(),
 WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), 
MulAcc->hasNoSignedWrap()),
 MulAcc->getDebugLoc()),
-ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
+ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()),
+IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()),
 ResultTy(MulAcc->getResultType()),
 IsPartialReduction(MulAcc->isPartialReduction()) {}
 
@@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()),
+ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()),
+IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()),
 ResultTy(ResultTy),
 IsPartialReduction(isa(R)) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
@@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Instruction::CastOps::CastOpsEnd) {
+ExtOp0(Instruction::CastOps::CastOpsEnd),
+ExtOp1(Instruction::CastOps::CastOpsEnd) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
Instruction::Add &&
"The reduction instruction in MulAccumulateReductionRecipe must be "
@@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; 
}
 
   /// Return if the operands of mul instruction come from same extend.
-  bool isSameExtend() const { return getVecOp0() == getVecOp1(); }
+  bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); }
 
-  /// Return the opcode of the underlying extend.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
+  /// Return the opcode of the underlying extends.
+  Instruction::CastOps getExt0Opcode() const { return ExtOp0; }
+  Instruction::CastOps getExt1Opcode() const { return ExtOp1; }
+
+  /// Return if the first extend's opcode is ZExt.
+  bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; }
+
+  /// Return if the second extend's opcode is ZExt.
+  bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; }
 
-  /// Return if the extend opcode is ZExt.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
+  /// Return the non negative flag of the first ext recipe.
+  bool isNonNeg0() const { return IsNonNeg0; }
 
-  /// Return the non negative flag of the ext recipe.
-  bool isNonNeg() const { return IsNonNeg; }
+  /// Return the non negative flag of the second

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-04-29 Thread Sam Tebbs via llvm-branch-commits





SamTebbs33 wrote:

Yeah that's the case :). Let me know if you have any issues applying it after 
applying 113903 too.

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -427,6 +428,29 @@ Value *VPInstruction::generate(VPTransformState &State) {
{PredTy, ScalarTC->getType()},
{VIVElem0, ScalarTC}, nullptr, Name);
   }
+  // Count the number of bits set in each lane and reduce the result to a 
scalar
+  case VPInstruction::PopCount: {
+Value *Op = State.get(getOperand(0));
+auto *VT = Op->getType();

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -418,7 +418,13 @@ class LoopVectorizationPlanner {
   /// Build VPlans for the specified \p UserVF and \p UserIC if they are
   /// non-zero or all applicable candidate VFs otherwise. If vectorization and
   /// interleaving should be avoided up-front, no plans are generated.
-  void plan(ElementCount UserVF, unsigned UserIC);
+  /// RTChecks is a list of pointer pairs that should be checked for aliasing,
+  /// setting HasAliasMask to true in the case that an alias mask is generated

SamTebbs33 wrote:

Done, thanks.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-08 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/136997

>From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 23 Apr 2025 13:16:38 +0100
Subject: [PATCH 1/3] [LoopVectorizer] Bundle partial reductions with different
 extensions

This PR adds support for extensions of different signedness to
VPMulAccumulateReductionRecipe and allows such partial reductions to be
bundled into that class.
---
 llvm/lib/Transforms/Vectorize/VPlan.h | 42 +-
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++---
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 25 -
 .../partial-reduce-dot-product-mixed.ll   | 56 +--
 .../LoopVectorize/AArch64/vplan-printing.ll   | 29 +-
 5 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 20d272e69e6e7..e11f608d068da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public 
VPReductionRecipe {
 /// recipe is abstract and needs to be lowered to concrete recipes before
 /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}.
 class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend recipe.
-  Instruction::CastOps ExtOp;
+  /// Opcodes of the extend recipes.
+  Instruction::CastOps ExtOp0;
+  Instruction::CastOps ExtOp1;
 
-  /// Non-neg flag of the extend recipe.
-  bool IsNonNeg = false;
+  /// Non-neg flags of the extend recipe.
+  bool IsNonNeg0 = false;
+  bool IsNonNeg1 = false;
 
   Type *ResultTy;
 
@@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 MulAcc->getCondOp(), MulAcc->isOrdered(),
 WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), 
MulAcc->hasNoSignedWrap()),
 MulAcc->getDebugLoc()),
-ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
+ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()),
+IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()),
 ResultTy(MulAcc->getResultType()),
 IsPartialReduction(MulAcc->isPartialReduction()) {}
 
@@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()),
+ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()),
+IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()),
 ResultTy(ResultTy),
 IsPartialReduction(isa(R)) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
@@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Instruction::CastOps::CastOpsEnd) {
+ExtOp0(Instruction::CastOps::CastOpsEnd),
+ExtOp1(Instruction::CastOps::CastOpsEnd) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
Instruction::Add &&
"The reduction instruction in MulAccumulateReductionRecipe must be "
@@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; 
}
 
   /// Return if the operands of mul instruction come from same extend.
-  bool isSameExtend() const { return getVecOp0() == getVecOp1(); }
+  bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); }
 
-  /// Return the opcode of the underlying extend.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
+  /// Return the opcode of the underlying extends.
+  Instruction::CastOps getExt0Opcode() const { return ExtOp0; }
+  Instruction::CastOps getExt1Opcode() const { return ExtOp1; }
+
+  /// Return if the first extend's opcode is ZExt.
+  bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; }
+
+  /// Return if the second extend's opcode is ZExt.
+  bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; }
 
-  /// Return if the extend opcode is ZExt.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
+  /// Return the non negative flag of the first ext recipe.
+  bool isNonNeg0() const { return IsNonNeg0; }
 
-  /// Return the non negative flag of the ext recipe.
-  bool isNonNeg() const { return IsNonNeg; }
+  /// Return the non negative flag of the second

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-31 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/133090

>From 6193c2c846710472c7e604ef33a15cda18771328 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 26 Mar 2025 14:01:59 +
Subject: [PATCH 1/3] [LV] Reduce register usage for scaled reductions

---
 .../Transforms/Vectorize/LoopVectorize.cpp|  24 +-
 .../Transforms/Vectorize/VPRecipeBuilder.h|   3 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  14 +-
 .../partial-reduce-dot-product-neon.ll|  60 ++-
 .../AArch64/partial-reduce-dot-product.ll | 414 ++
 5 files changed, 495 insertions(+), 20 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c9f314c0ba481..da701ef9ff1a2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])
+  LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J]
+<< " to " << VF << " for ";
+ R->dump(););
+
 for (VPValue *DefV : R->definedValues()) {
   Type *ScalarTy = TypeInfo.inferScalarType(DefV);
   unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy);
-  RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]);
+  RegUsage[ClassID] += GetRegUsage(ScalarTy, VF);
 }
   }
 }
@@ -8963,8 +8976,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   if (isa(Instr) || isa(Instr))
 return tryToWidenMemory(Instr, Operands, Range);
 
-  if (getScalingForReduction(Instr))
-return tryToCreatePartialReduction(Instr, Operands);
+  if (auto ScaleFactor = getScalingForReduction(Instr))
+return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
 
   if (!shouldWiden(Instr, Range))
 return nullptr;
@@ -8988,7 +9001,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
 
 VPRecipeBase *
 VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
- ArrayRef Operands) {
+ ArrayRef Operands,
+ unsigned ScaleFactor) {
   assert(Operands.size() == 2 &&
  "Unexpected number of operands for partial reduction");
 
@@ -9021,7 +9035,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction 
*Reduction,
 BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
   }
   return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator,
-  Reduction);
+  ScaleFactor, Reduction);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h 
b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..fd0064a34c4c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -178,7 +178,8 @@ class VPRecipeBuilder {
   /// Create and return a partial reduction recipe for a reduction instruction
   /// along with binary operation and reduction phi operands.
   VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction,
-ArrayRef Operands);
+ArrayRef Operands,
+unsigned ScaleFactor);
 
   /// Set the recipe created for given ingredient.
   void setRecipe(Instruction *I, VPRecipeBase *R) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 80b3d2a760293..d84efb1bd6850 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2001,6 +2001,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Generate the phi/select nodes.
   void execute(VPTransformState &State) override;
 
+  unsigned getVFScaleFactor() const { return VFScaleFactor; }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
   void print(raw_ostream &O, const Twine &Indent,
@@ -2031,17 +2033,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
 /// scalar value.
 class VPPartialR

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-31 Thread Sam Tebbs via llvm-branch-commits



@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-04 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 edited 
https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-04 Thread Sam Tebbs via llvm-branch-commits



@@ -2031,17 +2033,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
 /// scalar value.
 class VPPartialReductionRecipe : public VPSingleDefRecipe {
   unsigned Opcode;
+  unsigned ScaleFactor;

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-04 Thread Sam Tebbs via llvm-branch-commits



@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))

SamTebbs33 wrote:

Yeah that's a nice idea. We could add a `VPScaledRecipe` class. I agree with 
doing it afterwards.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-03 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/133090

>From d0a9e1c7e89abc5890d7303a2e22a9a56e2f022b Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 26 Mar 2025 14:01:59 +
Subject: [PATCH 1/6] [LV] Reduce register usage for scaled reductions

---
 .../Transforms/Vectorize/LoopVectorize.cpp|  24 ++-
 .../Transforms/Vectorize/VPRecipeBuilder.h|   3 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  14 +-
 .../partial-reduce-dot-product-neon.ll| 116 
 .../AArch64/partial-reduce-dot-product.ll | 173 ++
 .../LoopVectorize/AArch64/reg-usage.ll|   6 +-
 6 files changed, 171 insertions(+), 165 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2ebc7017f426a..486405991c612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5036,10 +5036,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])
+  LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J]
+<< " to " << VF << " for ";
+ R->dump(););
+
 for (VPValue *DefV : R->definedValues()) {
   Type *ScalarTy = TypeInfo.inferScalarType(DefV);
   unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy);
-  RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]);
+  RegUsage[ClassID] += GetRegUsage(ScalarTy, VF);
 }
   }
 }
@@ -9137,8 +9150,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   if (isa(Instr) || isa(Instr))
 return tryToWidenMemory(Instr, Operands, Range);
 
-  if (getScalingForReduction(Instr))
-return tryToCreatePartialReduction(Instr, Operands);
+  if (auto ScaleFactor = getScalingForReduction(Instr))
+return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
 
   if (!shouldWiden(Instr, Range))
 return nullptr;
@@ -9162,7 +9175,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
 
 VPRecipeBase *
 VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
- ArrayRef Operands) {
+ ArrayRef Operands,
+ unsigned ScaleFactor) {
   assert(Operands.size() == 2 &&
  "Unexpected number of operands for partial reduction");
 
@@ -9195,7 +9209,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction 
*Reduction,
 BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
   }
   return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator,
-  Reduction);
+  ScaleFactor, Reduction);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h 
b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..fd0064a34c4c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -178,7 +178,8 @@ class VPRecipeBuilder {
   /// Create and return a partial reduction recipe for a reduction instruction
   /// along with binary operation and reduction phi operands.
   VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction,
-ArrayRef Operands);
+ArrayRef Operands,
+unsigned ScaleFactor);
 
   /// Set the recipe created for given ingredient.
   void setRecipe(Instruction *I, VPRecipeBase *R) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 37e0a176ab1cc..376526e804b4b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2033,6 +2033,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Generate the phi/select nodes.
   void execute(VPTransformState &State) override;
 
+  unsigned getVFScaleFactor() const { return VFScaleFactor; }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
   void print(raw_ostream &O, const Twine &Indent,
@@ -2063,17 +2065,19 @@ class VPReductionPHIReci

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-03 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 commented:

Apologies for the review requesting noise.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-02 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/133090

>From 9a9164fce2a7fe1d602fd24cf9a9026b06190f31 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 26 Mar 2025 14:01:59 +
Subject: [PATCH 1/5] [LV] Reduce register usage for scaled reductions

---
 .../Transforms/Vectorize/LoopVectorize.cpp|  24 +-
 .../Transforms/Vectorize/VPRecipeBuilder.h|   3 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  14 +-
 .../partial-reduce-dot-product-neon.ll| 118 --
 .../AArch64/partial-reduce-dot-product.ll | 344 +-
 5 files changed, 282 insertions(+), 221 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1dbcbdbe083fe..400a510be308b 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5019,10 +5019,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])
+  LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J]
+<< " to " << VF << " for ";
+ R->dump(););
+
 for (VPValue *DefV : R->definedValues()) {
   Type *ScalarTy = TypeInfo.inferScalarType(DefV);
   unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy);
-  RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]);
+  RegUsage[ClassID] += GetRegUsage(ScalarTy, VF);
 }
   }
 }
@@ -8951,8 +8964,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   if (isa(Instr) || isa(Instr))
 return tryToWidenMemory(Instr, Operands, Range);
 
-  if (getScalingForReduction(Instr))
-return tryToCreatePartialReduction(Instr, Operands);
+  if (auto ScaleFactor = getScalingForReduction(Instr))
+return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value());
 
   if (!shouldWiden(Instr, Range))
 return nullptr;
@@ -8976,7 +8989,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
 
 VPRecipeBase *
 VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
- ArrayRef Operands) {
+ ArrayRef Operands,
+ unsigned ScaleFactor) {
   assert(Operands.size() == 2 &&
  "Unexpected number of operands for partial reduction");
 
@@ -9009,7 +9023,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction 
*Reduction,
 BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
   }
   return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator,
-  Reduction);
+  ScaleFactor, Reduction);
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h 
b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..fd0064a34c4c9 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -178,7 +178,8 @@ class VPRecipeBuilder {
   /// Create and return a partial reduction recipe for a reduction instruction
   /// along with binary operation and reduction phi operands.
   VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction,
-ArrayRef Operands);
+ArrayRef Operands,
+unsigned ScaleFactor);
 
   /// Set the recipe created for given ingredient.
   void setRecipe(Instruction *I, VPRecipeBase *R) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 37e0a176ab1cc..376526e804b4b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2033,6 +2033,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
   /// Generate the phi/select nodes.
   void execute(VPTransformState &State) override;
 
+  unsigned getVFScaleFactor() const { return VFScaleFactor; }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
   void print(raw_ostream &O, const Twine &Indent,
@@ -2063,17 +2065,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
 /// scalar value.
 class VPPart

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-02 Thread Sam Tebbs via llvm-branch-commits





SamTebbs33 wrote:

Good idea, done.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-04-04 Thread Sam Tebbs via llvm-branch-commits



@@ -5026,10 +5026,24 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = 
VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor());
+LLVM_DEBUG(if (VF != VFs[J]) {

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream 
&O, const Twine &Indent,
 }
 #endif
 
+void VPAliasLaneMaskRecipe::execute(VPTransformState &State) {
+  IRBuilderBase Builder = State.Builder;
+  Value *SinkValue = State.get(getSinkValue(), true);
+  Value *SourceValue = State.get(getSourceValue(), true);
+
+  auto *Type = SinkValue->getType();
+  Value *AliasMask = Builder.CreateIntrinsic(
+  Intrinsic::experimental_get_alias_lane_mask,
+  {VectorType::get(Builder.getInt1Ty(), State.VF), Type,
+   Builder.getInt64Ty()},
+  {SourceValue, SinkValue, Builder.getInt64(getAccessedElementSize()),
+   Builder.getInt1(WriteAfterRead)},
+  nullptr, "alias.lane.mask");
+  State.set(this, AliasMask, /*IsScalar=*/false);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPAliasLaneMaskRecipe::print(raw_ostream &O, const Twine &Indent,
+  VPSlotTracker &SlotTracker) const {
+  O << Indent << "EMIT ";
+  getVPSingleValue()->printAsOperand(O, SlotTracker);
+  O << " = alias lane mask ";

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -195,6 +195,13 @@ enum class TailFoldingStyle {
   DataWithEVL,
 };
 
+enum class RTCheckStyle {
+  /// Branch to scalar loop if checks fails at runtime.
+  ScalarFallback,
+  /// Form a mask based on elements which won't be a WAR or RAW hazard

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream 
&O, const Twine &Indent,
 }
 #endif
 
+void VPAliasLaneMaskRecipe::execute(VPTransformState &State) {
+  IRBuilderBase Builder = State.Builder;
+  Value *SinkValue = State.get(getSinkValue(), true);
+  Value *SourceValue = State.get(getSourceValue(), true);
+
+  auto *Type = SinkValue->getType();

SamTebbs33 wrote:

Not needed thanks to rebase.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-07 Thread Sam Tebbs via llvm-branch-commits



@@ -3073,6 +3075,56 @@ struct VPWidenStoreEVLRecipe final : public 
VPWidenMemoryRecipe {
   }
 };
 
+// Given a pointer A that is being stored to, and pointer B that is being
+// read from, both with unknown lengths, create a mask that disables
+// elements which could overlap across a loop iteration. For example, if A
+// is X and B is X + 2 with VF being 4, only the final two elements of the
+// loaded vector can be stored since they don't overlap with the stored
+// vector. %b.vec = load %b ; = [s, t, u, v]
+// [...]
+// store %a, %b.vec ; only u and v can be stored as their addresses don't
+// overlap with %a + (VF - 1)

SamTebbs33 wrote:

Yes you're right, this should say that the *first* two are valid. Thanks for 
spotting that. I've re-worded the comment to make it more clear.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-09 Thread Sam Tebbs via llvm-branch-commits



@@ -1416,14 +1466,14 @@ void VPlanTransforms::addActiveLaneMask(
   auto *FoundWidenCanonicalIVUser =
   find_if(Plan.getCanonicalIV()->users(),
   [](VPUser *U) { return isa(U); });
-  assert(FoundWidenCanonicalIVUser &&
+  assert(FoundWidenCanonicalIVUser && *FoundWidenCanonicalIVUser &&

SamTebbs33 wrote:

Done, thanks.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-09 Thread Sam Tebbs via llvm-branch-commits



@@ -1300,14 +1301,38 @@ static VPActiveLaneMaskPHIRecipe 
*addVPLaneMaskPhiAndUpdateExitBranch(
   cast(CanonicalIVPHI->getBackedgeValue());
   // TODO: Check if dropping the flags is needed if
   // !DataAndControlFlowWithoutRuntimeCheck.
+  VPValue *IncVal = CanonicalIVIncrement->getOperand(1);
+  assert(IncVal != CanonicalIVPHI && "Unexpected operand order");
+
   CanonicalIVIncrement->dropPoisonGeneratingFlags();
   DebugLoc DL = CanonicalIVIncrement->getDebugLoc();
+
   // We can't use StartV directly in the ActiveLaneMask VPInstruction, since
   // we have to take unrolling into account. Each part needs to start at
   //   Part * VF
   auto *VecPreheader = Plan.getVectorPreheader();
   VPBuilder Builder(VecPreheader);
 
+  // Create an alias mask for each possibly-aliasing pointer pair. If there
+  // are multiple they are combined together with ANDs.
+  VPValue *AliasMask = nullptr;
+
+  for (auto C : RTChecks) {
+// FIXME: How to pass this info back?
+//HasAliasMask = true;

SamTebbs33 wrote:

The info is acutally being passed back so I can remove this FIXME. Done.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-09 Thread Sam Tebbs via llvm-branch-commits



@@ -3073,6 +3075,56 @@ struct VPWidenStoreEVLRecipe final : public 
VPWidenMemoryRecipe {
   }
 };
 
+// Given a pointer A that is being stored to, and pointer B that is being
+// read from, both with unknown lengths, create a mask that disables
+// elements which could overlap across a loop iteration. For example, if A
+// is X and B is X + 2 with VF being 4, only the final two elements of the
+// loaded vector can be stored since they don't overlap with the stored
+// vector. %b.vec = load %b ; = [s, t, u, v]
+// [...]
+// store %a, %b.vec ; only u and v can be stored as their addresses don't
+// overlap with %a + (VF - 1)
+class VPAliasLaneMaskRecipe : public VPSingleDefRecipe {
+
+public:
+  VPAliasLaneMaskRecipe(VPValue *Src, VPValue *Sink, unsigned ElementSize,
+bool WriteAfterRead)
+  : VPSingleDefRecipe(VPDef::VPAliasLaneMaskSC, {Src, Sink}),
+ElementSize(ElementSize), WriteAfterRead(WriteAfterRead) {}
+
+  ~VPAliasLaneMaskRecipe() override = default;
+
+  VPAliasLaneMaskRecipe *clone() override {
+return new VPAliasLaneMaskRecipe(getSourceValue(), getSinkValue(),
+ ElementSize, WriteAfterRead);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPAliasLaneMaskSC);
+
+  void execute(VPTransformState &State) override;
+
+  /// Get the VPValue* for the pointer being read from
+  VPValue *getSourceValue() const { return getOperand(0); }
+
+  // Get the size of the element(s) accessed by the pointers
+  unsigned getAccessedElementSize() const { return ElementSize; }
+
+  /// Get the VPValue* for the pointer being stored to
+  VPValue *getSinkValue() const { return getOperand(1); }
+
+  bool isWriteAfterRead() const { return WriteAfterRead; }
+
+private:
+  unsigned ElementSize;
+  bool WriteAfterRead;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-09 Thread Sam Tebbs via llvm-branch-commits



@@ -77,9 +77,13 @@ struct VPlanTransforms {
   /// creation) and instead it is handled using active-lane-mask. \p
   /// DataAndControlFlowWithoutRuntimeCheck implies \p
   /// UseActiveLaneMaskForControlFlow.
+  /// RTChecks refers to the pointer pairs that need aliasing elements to be
+  /// masked off each loop iteration.

SamTebbs33 wrote:

Added, let me know if anything about it should change.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)

2025-05-09 Thread Sam Tebbs via llvm-branch-commits



@@ -1331,14 +1356,37 @@ static VPActiveLaneMaskPHIRecipe 
*addVPLaneMaskPhiAndUpdateExitBranch(
   "index.part.next");
 
   // Create the active lane mask instruction in the VPlan preheader.
-  auto *EntryALM =
+  VPValue *Mask =
   Builder.createNaryOp(VPInstruction::ActiveLaneMask, {EntryIncrement, TC},
DL, "active.lane.mask.entry");
 
   // Now create the ActiveLaneMaskPhi recipe in the main loop using the
   // preheader ActiveLaneMask instruction.
-  auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(EntryALM, DebugLoc());
+  auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(Mask, DebugLoc());
   LaneMaskPhi->insertAfter(CanonicalIVPHI);
+  VPValue *LaneMask = LaneMaskPhi;
+  if (AliasMask) {
+// Increment phi by correct amount.
+Builder.setInsertPoint(CanonicalIVIncrement);
+
+VPValue *IncrementBy = Builder.createNaryOp(VPInstruction::PopCount,
+{AliasMask}, DL, "popcount");
+Type *IVType = CanonicalIVPHI->getScalarType();
+
+if (IVType->getScalarSizeInBits() < 64) {
+  auto *Cast =
+  new VPScalarCastRecipe(Instruction::Trunc, IncrementBy, IVType);
+  Cast->insertAfter(IncrementBy->getDefiningRecipe());
+  IncrementBy = Cast;
+}
+CanonicalIVIncrement->setOperand(1, IncrementBy);
+
+// And the alias mask so the iteration only processes non-aliasing lanes
+Builder.setInsertPoint(CanonicalIVPHI->getParent(),
+   CanonicalIVPHI->getParent()->getFirstNonPhi());
+LaneMask = Builder.createNaryOp(Instruction::BinaryOps::And,
+{LaneMaskPhi, AliasMask}, DL);

SamTebbs33 wrote:

We don't, and there's actually a case in the test suite that hangs because the 
mask is all-false. I'll start looking into a solution for that.

https://github.com/llvm/llvm-project/pull/100579
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-20 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/136997

>From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Wed, 23 Apr 2025 13:16:38 +0100
Subject: [PATCH 1/5] [LoopVectorizer] Bundle partial reductions with different
 extensions

This PR adds support for extensions of different signedness to
VPMulAccumulateReductionRecipe and allows such partial reductions to be
bundled into that class.
---
 llvm/lib/Transforms/Vectorize/VPlan.h | 42 +-
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++---
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 25 -
 .../partial-reduce-dot-product-mixed.ll   | 56 +--
 .../LoopVectorize/AArch64/vplan-printing.ll   | 29 +-
 5 files changed, 99 insertions(+), 80 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h 
b/llvm/lib/Transforms/Vectorize/VPlan.h
index 20d272e69e6e7..e11f608d068da 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public 
VPReductionRecipe {
 /// recipe is abstract and needs to be lowered to concrete recipes before
 /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}.
 class VPMulAccumulateReductionRecipe : public VPReductionRecipe {
-  /// Opcode of the extend recipe.
-  Instruction::CastOps ExtOp;
+  /// Opcodes of the extend recipes.
+  Instruction::CastOps ExtOp0;
+  Instruction::CastOps ExtOp1;
 
-  /// Non-neg flag of the extend recipe.
-  bool IsNonNeg = false;
+  /// Non-neg flags of the extend recipe.
+  bool IsNonNeg0 = false;
+  bool IsNonNeg1 = false;
 
   Type *ResultTy;
 
@@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 MulAcc->getCondOp(), MulAcc->isOrdered(),
 WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), 
MulAcc->hasNoSignedWrap()),
 MulAcc->getDebugLoc()),
-ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()),
+ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()),
+IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()),
 ResultTy(MulAcc->getResultType()),
 IsPartialReduction(MulAcc->isPartialReduction()) {}
 
@@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()),
+ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()),
+IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()),
 ResultTy(ResultTy),
 IsPartialReduction(isa(R)) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
@@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
 R->getCondOp(), R->isOrdered(),
 WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()),
 R->getDebugLoc()),
-ExtOp(Instruction::CastOps::CastOpsEnd) {
+ExtOp0(Instruction::CastOps::CastOpsEnd),
+ExtOp1(Instruction::CastOps::CastOpsEnd) {
 assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) ==
Instruction::Add &&
"The reduction instruction in MulAccumulateReductionRecipe must be "
@@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; 
}
 
   /// Return if the operands of mul instruction come from same extend.
-  bool isSameExtend() const { return getVecOp0() == getVecOp1(); }
+  bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); }
 
-  /// Return the opcode of the underlying extend.
-  Instruction::CastOps getExtOpcode() const { return ExtOp; }
+  /// Return the opcode of the underlying extends.
+  Instruction::CastOps getExt0Opcode() const { return ExtOp0; }
+  Instruction::CastOps getExt1Opcode() const { return ExtOp1; }
+
+  /// Return if the first extend's opcode is ZExt.
+  bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; }
+
+  /// Return if the second extend's opcode is ZExt.
+  bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; }
 
-  /// Return if the extend opcode is ZExt.
-  bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; }
+  /// Return the non negative flag of the first ext recipe.
+  bool isNonNeg0() const { return IsNonNeg0; }
 
-  /// Return the non negative flag of the ext recipe.
-  bool isNonNeg() const { return IsNonNeg; }
+  /// Return the non negative flag of the second

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-05-20 Thread Sam Tebbs via llvm-branch-commits



@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public 
VPReductionRecipe {
   VPValue *getVecOp1() const { return getOperand(2); }
 
   /// Return if this MulAcc recipe contains extend instructions.
-  bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; }
+  bool isExtended() const {
+return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd;

SamTebbs33 wrote:

That can't happen at the moment, but I think you're right and it's worth 
considering the other extension as well. Done.

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)

2025-05-20 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

Ping :)

https://github.com/llvm/llvm-project/pull/136173
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-06-19 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

Superseded by https://github.com/llvm/llvm-project/pull/144908

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-06-19 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 closed 
https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)

2025-06-04 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

Really sorry for the spam again, I pushed to the user branch in my fork rather 
than the base branch in llvm :facepalm: 

https://github.com/llvm/llvm-project/pull/136997
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 closed 
https://github.com/llvm/llvm-project/pull/146073
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits


SamTebbs33 wrote:

Closed in favour of a PR based on top of 
https://github.com/llvm/llvm-project/pull/147302

https://github.com/llvm/llvm-project/pull/146073
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)

2025-07-07 Thread Sam Tebbs via llvm-branch-commits



@@ -2744,6 +2702,12 @@ class VPSingleDefBundleRecipe : public VPSingleDefRecipe 
{
 /// vector operands, performing a reduction.add on the result, and adding
 /// the scalar result to a chain.
 MulAccumulateReduction,
+/// Represent an inloop multiply-accumulate reduction, multiplying the
+/// extended vector operands, negating the multiplication, performing a
+/// reduction.add
+/// on the result, and adding
+/// the scalar result to a chain.
+ExtNegatedMulAccumulateReduction,

SamTebbs33 wrote:

Thanks Florian, that sounds like a good approach.

https://github.com/llvm/llvm-project/pull/146073
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-07 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 created 
https://github.com/llvm/llvm-project/pull/147255

This PR bundles sub reductions into the VPExpressionRecipe class and adjusts 
the cost functions to take the negation into account.

>From 1a5f4e42e4f9d1eae0222302dcabdf08492f67c3 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Mon, 30 Jun 2025 14:29:54 +0100
Subject: [PATCH] [LV] Bundle sub reductions into VPExpressionRecipe

This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.
---
 .../llvm/Analysis/TargetTransformInfo.h   |   4 +-
 .../llvm/Analysis/TargetTransformInfoImpl.h   |   2 +-
 llvm/include/llvm/CodeGen/BasicTTIImpl.h  |   3 +
 llvm/lib/Analysis/TargetTransformInfo.cpp |   5 +-
 .../AArch64/AArch64TargetTransformInfo.cpp|   7 +-
 .../AArch64/AArch64TargetTransformInfo.h  |   2 +-
 .../lib/Target/ARM/ARMTargetTransformInfo.cpp |   7 +-
 llvm/lib/Target/ARM/ARMTargetTransformInfo.h  |   1 +
 .../Transforms/Vectorize/LoopVectorize.cpp|   6 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  11 ++
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp |  35 -
 .../Transforms/Vectorize/VPlanTransforms.cpp  |  33 ++--
 .../Transforms/Vectorize/VectorCombine.cpp|   4 +-
 .../vplan-printing-reductions.ll  | 143 ++
 14 files changed, 236 insertions(+), 27 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index c43870392361d..3cc0ea01953c3 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1645,8 +1645,10 @@ class TargetTransformInfo {
   /// extensions. This is the cost of as:
   /// ResTy vecreduce.add(mul (A, B)).
   /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)).
+  /// The multiply can optionally be negated, which signifies that it is a sub
+  /// reduction.
   LLVM_ABI InstructionCost getMulAccReductionCost(
-  bool IsUnsigned, Type *ResTy, VectorType *Ty,
+  bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated,
   TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;
 
   /// Calculate the cost of an extended reduction pattern, similar to
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h 
b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 12f87226c5f57..fd22981a5dbf3 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -960,7 +960,7 @@ class TargetTransformInfoImplBase {
 
   virtual InstructionCost
   getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty,
- TTI::TargetCostKind CostKind) const {
+ bool Negated, TTI::TargetCostKind CostKind) const {
 return 1;
   }
 
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index bf958e100f2ac..a9c9fa6d1db0d 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
 
   InstructionCost
   getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty,
+ bool Negated,
  TTI::TargetCostKind CostKind) const override {
+if (Negated)
+  return InstructionCost::getInvalid(CostKind);
 // Without any native support, this is equivalent to the cost of
 // vecreduce.add(mul(ext(Ty A), ext(Ty B))) or
 // vecreduce.add(mul(A, B)).
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp 
b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 3ebd9d487ba04..ba0d070bffe6d 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1274,9 +1274,10 @@ InstructionCost 
TargetTransformInfo::getExtendedReductionCost(
 }
 
 InstructionCost TargetTransformInfo::getMulAccReductionCost(
-bool IsUnsigned, Type *ResTy, VectorType *Ty,
+bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated,
 TTI::TargetCostKind CostKind) const {
-  return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, CostKind);
+  return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, Negated,
+ CostKind);
 }
 
 InstructionCost
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 380faa6cf6939..d9a367535baf4 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5316,8 +5316,10 @@ InstructionCost AArch64TTIImpl::getExtendedReductionCost(
 
 InstructionCost
 AArch64TTIImpl::getMulAccReductionCost(bool IsUnsigned, Type *ResTy,
-   VectorType *VecTy,
+   VectorType *VecTy, bo

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits


https://github.com/SamTebbs33 updated 
https://github.com/llvm/llvm-project/pull/147255

>From 1a5f4e42e4f9d1eae0222302dcabdf08492f67c3 Mon Sep 17 00:00:00 2001
From: Samuel Tebbs 
Date: Mon, 30 Jun 2025 14:29:54 +0100
Subject: [PATCH 1/2] [LV] Bundle sub reductions into VPExpressionRecipe

This PR bundles sub reductions into the VPExpressionRecipe class and
adjusts the cost functions to take the negation into account.
---
 .../llvm/Analysis/TargetTransformInfo.h   |   4 +-
 .../llvm/Analysis/TargetTransformInfoImpl.h   |   2 +-
 llvm/include/llvm/CodeGen/BasicTTIImpl.h  |   3 +
 llvm/lib/Analysis/TargetTransformInfo.cpp |   5 +-
 .../AArch64/AArch64TargetTransformInfo.cpp|   7 +-
 .../AArch64/AArch64TargetTransformInfo.h  |   2 +-
 .../lib/Target/ARM/ARMTargetTransformInfo.cpp |   7 +-
 llvm/lib/Target/ARM/ARMTargetTransformInfo.h  |   1 +
 .../Transforms/Vectorize/LoopVectorize.cpp|   6 +-
 llvm/lib/Transforms/Vectorize/VPlan.h |  11 ++
 .../lib/Transforms/Vectorize/VPlanRecipes.cpp |  35 -
 .../Transforms/Vectorize/VPlanTransforms.cpp  |  33 ++--
 .../Transforms/Vectorize/VectorCombine.cpp|   4 +-
 .../vplan-printing-reductions.ll  | 143 ++
 14 files changed, 236 insertions(+), 27 deletions(-)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index c43870392361d..3cc0ea01953c3 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1645,8 +1645,10 @@ class TargetTransformInfo {
   /// extensions. This is the cost of as:
   /// ResTy vecreduce.add(mul (A, B)).
   /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)).
+  /// The multiply can optionally be negated, which signifies that it is a sub
+  /// reduction.
   LLVM_ABI InstructionCost getMulAccReductionCost(
-  bool IsUnsigned, Type *ResTy, VectorType *Ty,
+  bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated,
   TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;
 
   /// Calculate the cost of an extended reduction pattern, similar to
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h 
b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 12f87226c5f57..fd22981a5dbf3 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -960,7 +960,7 @@ class TargetTransformInfoImplBase {
 
   virtual InstructionCost
   getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty,
- TTI::TargetCostKind CostKind) const {
+ bool Negated, TTI::TargetCostKind CostKind) const {
 return 1;
   }
 
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index bf958e100f2ac..a9c9fa6d1db0d 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
 
   InstructionCost
   getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty,
+ bool Negated,
  TTI::TargetCostKind CostKind) const override {
+if (Negated)
+  return InstructionCost::getInvalid(CostKind);
 // Without any native support, this is equivalent to the cost of
 // vecreduce.add(mul(ext(Ty A), ext(Ty B))) or
 // vecreduce.add(mul(A, B)).
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp 
b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 3ebd9d487ba04..ba0d070bffe6d 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1274,9 +1274,10 @@ InstructionCost 
TargetTransformInfo::getExtendedReductionCost(
 }
 
 InstructionCost TargetTransformInfo::getMulAccReductionCost(
-bool IsUnsigned, Type *ResTy, VectorType *Ty,
+bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated,
 TTI::TargetCostKind CostKind) const {
-  return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, CostKind);
+  return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, Negated,
+ CostKind);
 }
 
 InstructionCost
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 380faa6cf6939..d9a367535baf4 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5316,8 +5316,10 @@ InstructionCost AArch64TTIImpl::getExtendedReductionCost(
 
 InstructionCost
 AArch64TTIImpl::getMulAccReductionCost(bool IsUnsigned, Type *ResTy,
-   VectorType *VecTy,
+   VectorType *VecTy, bool Negated,
TTI::TargetCostKind CostKind) const {
+  if (Negated)
+return Instruction

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -2725,6 +2729,31 @@ void VPExpressionRecipe::print(raw_ostream &O, const 
Twine &Indent,
 O << ")";
 break;
   }
+  case ExpressionTypes::ExtNegatedMulAccReduction: {

SamTebbs33 wrote:

That was my initial approach but it required checking the number of operands to 
know if there was a sub or not, and I was asked to create an expression type to 
not rely on operand ordering being stable.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -1645,8 +1645,10 @@ class TargetTransformInfo {
   /// extensions. This is the cost of as:
   /// ResTy vecreduce.add(mul (A, B)).
   /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)).
+  /// The multiply can optionally be negated, which signifies that it is a sub
+  /// reduction.
   LLVM_ABI InstructionCost getMulAccReductionCost(
-  bool IsUnsigned, Type *ResTy, VectorType *Ty,
+  bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated,

SamTebbs33 wrote:

Good idea, done.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -5538,7 +5538,7 @@ 
LoopVectorizationCostModel::getReductionPatternCost(Instruction *I,
  TTI::CastContextHint::None, CostKind, RedOp);
 
 InstructionCost RedCost = TTI.getMulAccReductionCost(
-IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, CostKind);
+IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, false, CostKind);

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
 
   InstructionCost
   getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty,
+ bool Negated,
  TTI::TargetCostKind CostKind) const override {
+if (Negated)

SamTebbs33 wrote:

Thanks, I've added a cost for the sub.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -2757,6 +2757,12 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
 /// vector operands, performing a reduction.add on the result, and adding
 /// the scalar result to a chain.
 MulAccReduction,
+/// Represent an inloop multiply-accumulate reduction, multiplying the
+/// extended vector operands, negating the multiplication, performing a
+/// reduction.add
+/// on the result, and adding

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)

2025-07-08 Thread Sam Tebbs via llvm-branch-commits



@@ -1401,8 +1401,8 @@ static void analyzeCostOfVecReduction(const IntrinsicInst 
&II,
  TTI::CastContextHint::None, CostKind, RedOp);
 
 CostBeforeReduction = ExtCost * 2 + MulCost + Ext2Cost;
-CostAfterReduction =
-TTI.getMulAccReductionCost(IsUnsigned, II.getType(), ExtType, 
CostKind);
+CostAfterReduction = TTI.getMulAccReductionCost(IsUnsigned, II.getType(),
+ExtType, false, CostKind);

SamTebbs33 wrote:

Done.

https://github.com/llvm/llvm-project/pull/147255
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

78 matches

Mail list logo