[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)
https://github.com/SamTebbs33 created https://github.com/llvm/llvm-project/pull/107210 None >From 8296e727435492d4a5b49deea76c098d6f54081f Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 4 Sep 2024 11:05:17 +0100 Subject: [PATCH] Add frontend for search --- graphite-demo/frontend.jsx | 56 ++ 1 file changed, 56 insertions(+) create mode 100644 graphite-demo/frontend.jsx diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx new file mode 100644 index 00..dd6a2a3ba66cc5 --- /dev/null +++ b/graphite-demo/frontend.jsx @@ -0,0 +1,56 @@ +import React, { useEffect, useState } from 'react'; + +const TaskSearch = () => { + const [tasks, setTasks] = useState([]); + const [loading, setLoading] = useState(true); + const [error, setError] = useState(null); + const [searchQuery, setSearchQuery] = useState(''); + + useEffect(() => { +setLoading(true); +fetch(`/search?query=${encodeURIComponent(searchQuery)}`) + .then(response => { +if (!response.ok) { + throw new Error('Network response was not ok'); +} +return response.json(); + }) + .then(data => { +setTasks(data); +setLoading(false); + }) + .catch(error => { +setError(error.message); +setLoading(false); + }); + }, [searchQuery]); // Depend on searchQuery + + if (loading) { +return Loading...; + } + + if (error) { +return Error: {error}; + } + + return ( + + Task Search + setSearchQuery(e.target.value)} + /> + +{tasks.map(task => ( + +{task.description} + +))} + + + ); +}; + +export default TaskSearch; \ No newline at end of file ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)
SamTebbs33 wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#107210** https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#107209** https://app.graphite.dev/github/pr/llvm/llvm-project/107209?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @SamTebbs33 and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/107210 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/107210 >From 4dae516fc2be004f79362b455b835754eeda953d Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 4 Sep 2024 11:05:17 +0100 Subject: [PATCH] Add frontend for search --- graphite-demo/frontend.jsx | 56 ++ 1 file changed, 56 insertions(+) create mode 100644 graphite-demo/frontend.jsx diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx new file mode 100644 index 00..dd6a2a3ba66cc5 --- /dev/null +++ b/graphite-demo/frontend.jsx @@ -0,0 +1,56 @@ +import React, { useEffect, useState } from 'react'; + +const TaskSearch = () => { + const [tasks, setTasks] = useState([]); + const [loading, setLoading] = useState(true); + const [error, setError] = useState(null); + const [searchQuery, setSearchQuery] = useState(''); + + useEffect(() => { +setLoading(true); +fetch(`/search?query=${encodeURIComponent(searchQuery)}`) + .then(response => { +if (!response.ok) { + throw new Error('Network response was not ok'); +} +return response.json(); + }) + .then(data => { +setTasks(data); +setLoading(false); + }) + .catch(error => { +setError(error.message); +setLoading(false); + }); + }, [searchQuery]); // Depend on searchQuery + + if (loading) { +return Loading...; + } + + if (error) { +return Error: {error}; + } + + return ( + + Task Search + setSearchQuery(e.target.value)} + /> + +{tasks.map(task => ( + +{task.description} + +))} + + + ); +}; + +export default TaskSearch; \ No newline at end of file ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add user search (PR #107211)
https://github.com/SamTebbs33 created https://github.com/llvm/llvm-project/pull/107211 None >From e99c4dca4bfb7bed5c3069e056fb566b9c655eaa Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 4 Sep 2024 11:07:55 +0100 Subject: [PATCH] Add user search --- graphite-demo/frontend.jsx | 23 +-- graphite-demo/server.js| 29 + 2 files changed, 42 insertions(+), 10 deletions(-) diff --git a/graphite-demo/frontend.jsx b/graphite-demo/frontend.jsx index dd6a2a3ba66cc5..10512ee5f98f86 100644 --- a/graphite-demo/frontend.jsx +++ b/graphite-demo/frontend.jsx @@ -1,7 +1,8 @@ import React, { useEffect, useState } from 'react'; -const TaskSearch = () => { +const TaskAndUserSearch = () => { const [tasks, setTasks] = useState([]); + const [users, setUsers] = useState([]); const [loading, setLoading] = useState(true); const [error, setError] = useState(null); const [searchQuery, setSearchQuery] = useState(''); @@ -16,14 +17,15 @@ const TaskSearch = () => { return response.json(); }) .then(data => { -setTasks(data); +setTasks(data.tasks); +setUsers(data.users); setLoading(false); }) .catch(error => { setError(error.message); setLoading(false); }); - }, [searchQuery]); // Depend on searchQuery + }, [searchQuery]); if (loading) { return Loading...; @@ -35,13 +37,14 @@ const TaskSearch = () => { return ( - Task Search + Search Tasks and Users setSearchQuery(e.target.value)} /> + Tasks {tasks.map(task => ( @@ -49,8 +52,16 @@ const TaskSearch = () => { ))} + Users + +{users.map(user => ( + +{user.name} + +))} + ); }; -export default TaskSearch; \ No newline at end of file +export default TaskAndUserSearch; \ No newline at end of file diff --git a/graphite-demo/server.js b/graphite-demo/server.js index cf7ec6507287f8..ff79b7d4915f8d 100644 --- a/graphite-demo/server.js +++ b/graphite-demo/server.js @@ -18,17 +18,38 @@ const tasks = [ } ]; +// Fake data for users +const users = [ + { +id: 101, +name: 'Alice Smith' + }, + { +id: 102, +name: 'Bob Johnson' + }, + { +id: 103, +name: 'Charlie Brown' + } +]; + app.get('/search', (req, res) => { // Retrieve the query parameter const query = req.query.query?.toLowerCase() || ''; // Filter tasks based on the query - const filteredTasks = tasks.filter(task => task.description.toLowerCase().includes(query)); + const filteredTasks = tasks.filter(task => +task.description.toLowerCase().includes(query) + ).sort((a, b) => a.description.localeCompare(b.description)); - // Sort the filtered tasks alphabetically by description - const sortedTasks = filteredTasks.sort((a, b) => a.description.localeCompare(b.description)); + // Filter users based on the query + const filteredUsers = users.filter(user => +user.name.toLowerCase().includes(query) + ).sort((a, b) => a.name.localeCompare(b.name)); - res.json(sortedTasks); + // Return both sets of results + res.json({ tasks: filteredTasks, users: filteredUsers }); }); app.listen(port, () => { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add user search (PR #107211)
SamTebbs33 wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/107211?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#107211** https://app.graphite.dev/github/pr/llvm/llvm-project/107211?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#107210** https://app.graphite.dev/github/pr/llvm/llvm-project/107210?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#107209** https://app.graphite.dev/github/pr/llvm/llvm-project/107209?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @SamTebbs33 and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/107211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add user search (PR #107211)
https://github.com/SamTebbs33 closed https://github.com/llvm/llvm-project/pull/107211 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add frontend for search (PR #107210)
https://github.com/SamTebbs33 closed https://github.com/llvm/llvm-project/pull/107210 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 60fda8e - [ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch
Author: Sam Tebbs Date: 2021-01-13T17:23:00Z New Revision: 60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22 URL: https://github.com/llvm/llvm-project/commit/60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22 DIFF: https://github.com/llvm/llvm-project/commit/60fda8ebb6dc4e2ac1cc181c0ab8019c4309cb22.diff LOG: [ARM] Add a pass that re-arranges blocks when there is a backwards WLS branch Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken. Differential Revision: https://reviews.llvm.org/D92385 Added: llvm/lib/Target/ARM/ARMBlockPlacement.cpp llvm/test/CodeGen/Thumb2/block-placement.mir Modified: llvm/lib/Target/ARM/ARM.h llvm/lib/Target/ARM/ARMTargetMachine.cpp llvm/lib/Target/ARM/CMakeLists.txt llvm/test/CodeGen/ARM/O3-pipeline.ll Removed: diff --git a/llvm/lib/Target/ARM/ARM.h b/llvm/lib/Target/ARM/ARM.h index d8a4e4c31012..f4fdc9803728 100644 --- a/llvm/lib/Target/ARM/ARM.h +++ b/llvm/lib/Target/ARM/ARM.h @@ -37,6 +37,7 @@ class PassRegistry; Pass *createMVETailPredicationPass(); FunctionPass *createARMLowOverheadLoopsPass(); +FunctionPass *createARMBlockPlacementPass(); Pass *createARMParallelDSPPass(); FunctionPass *createARMISelDag(ARMBaseTargetMachine &TM, CodeGenOpt::Level OptLevel); @@ -71,6 +72,7 @@ void initializeThumb2ITBlockPass(PassRegistry &); void initializeMVEVPTBlockPass(PassRegistry &); void initializeMVEVPTOptimisationsPass(PassRegistry &); void initializeARMLowOverheadLoopsPass(PassRegistry &); +void initializeARMBlockPlacementPass(PassRegistry &); void initializeMVETailPredicationPass(PassRegistry &); void initializeMVEGatherScatterLoweringPass(PassRegistry &); void initializeARMSLSHardeningPass(PassRegistry &); diff --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp new file mode 100644 index ..fda05f526335 --- /dev/null +++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp @@ -0,0 +1,227 @@ +//===-- ARMBlockPlacement.cpp - ARM block placement pass ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This pass re-arranges machine basic blocks to suit target requirements. +// Currently it only moves blocks to fix backwards WLS branches. +// +//===--===// + +#include "ARM.h" +#include "ARMBaseInstrInfo.h" +#include "ARMBasicBlockInfo.h" +#include "ARMSubtarget.h" +#include "llvm/CodeGen/MachineFunctionPass.h" +#include "llvm/CodeGen/MachineInstrBuilder.h" +#include "llvm/CodeGen/MachineLoopInfo.h" + +using namespace llvm; + +#define DEBUG_TYPE "arm-block-placement" +#define DEBUG_PREFIX "ARM Block Placement: " + +namespace llvm { +class ARMBlockPlacement : public MachineFunctionPass { +private: + const ARMBaseInstrInfo *TII; + std::unique_ptr BBUtils = nullptr; + MachineLoopInfo *MLI = nullptr; + +public: + static char ID; + ARMBlockPlacement() : MachineFunctionPass(ID) {} + + bool runOnMachineFunction(MachineFunction &MF) override; + void moveBasicBlock(MachineBasicBlock *BB, MachineBasicBlock *After); + bool blockIsBefore(MachineBasicBlock *BB, MachineBasicBlock *Other); + + void getAnalysisUsage(AnalysisUsage &AU) const override { +AU.setPreservesCFG(); +AU.addRequired(); +MachineFunctionPass::getAnalysisUsage(AU); + } +}; + +} // namespace llvm + +FunctionPass *llvm::createARMBlockPlacementPass() { + return new ARMBlockPlacement(); +} + +char ARMBlockPlacement::ID = 0; + +INITIALIZE_PASS(ARMBlockPlacement, DEBUG_TYPE, "ARM block placement", false, +false) + +bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) { + const ARMSubtarget &ST = static_cast(MF.getSubtarget()); + if (!ST.hasLOB()) +return false; + LLVM_DEBUG(dbgs() << DEBUG_PREFIX << "Running on " << MF.getName() << "\n"); + MLI = &getAnalysis(); + TII = static_cast(ST.getInstrInfo()); + BBUtils = std::unique_ptr(new ARMBasicBlockUtils(MF)); + MF.RenumberBlocks(); + BBUtils->computeAllBlockSizes(); + BBUtils->adjustBBOffsetsAfter(&MF.front()); + bool Changed = false; + + // Find loops with a backwards branching WLS. + // This requires looping over the loops in the function, checking each + // preheader for a WLS and if its target is before the preheader. If moving + // the target block wouldn't produce another backwards WLS or a new forwards + // LE branch then move the target block after the preh
[llvm-branch-commits] [llvm] 5e4480b - [ARM] Don't run the block placement pass at O0
Author: Sam Tebbs Date: 2021-01-15T13:59:29Z New Revision: 5e4480b6c0f02beef5ca7f62c3427031872fcd52 URL: https://github.com/llvm/llvm-project/commit/5e4480b6c0f02beef5ca7f62c3427031872fcd52 DIFF: https://github.com/llvm/llvm-project/commit/5e4480b6c0f02beef5ca7f62c3427031872fcd52.diff LOG: [ARM] Don't run the block placement pass at O0 The block placement pass shouldn't run unless optimisations are enabled. Differential Revision: https://reviews.llvm.org/D94691 Added: Modified: llvm/lib/Target/ARM/ARMBlockPlacement.cpp llvm/lib/Target/ARM/ARMTargetMachine.cpp Removed: diff --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp index fda05f526335..20491273ea5d 100644 --- a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp +++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp @@ -58,6 +58,8 @@ INITIALIZE_PASS(ARMBlockPlacement, DEBUG_TYPE, "ARM block placement", false, false) bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) { + if (skipFunction(MF.getFunction())) + return false; const ARMSubtarget &ST = static_cast(MF.getSubtarget()); if (!ST.hasLOB()) return false; diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp index 51399941629a..237ef54c8339 100644 --- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp +++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp @@ -553,11 +553,11 @@ void ARMPassConfig::addPreEmitPass() { return MF.getSubtarget().isThumb2(); })); - addPass(createARMBlockPlacementPass()); - - // Don't optimize barriers at -O0. - if (getOptLevel() != CodeGenOpt::None) + // Don't optimize barriers or block placement at -O0. + if (getOptLevel() != CodeGenOpt::None) { +addPass(createARMBlockPlacementPass()); addPass(createARMOptimizeBarriersPass()); + } } void ARMPassConfig::addPreEmitPass2() { ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 1a497ae - [ARM][Block placement] Check the predecessor exists before processing it
Author: Sam Tebbs Date: 2021-01-15T15:45:13Z New Revision: 1a497ae9b83653682d6d20f1ec131394e523375d URL: https://github.com/llvm/llvm-project/commit/1a497ae9b83653682d6d20f1ec131394e523375d DIFF: https://github.com/llvm/llvm-project/commit/1a497ae9b83653682d6d20f1ec131394e523375d.diff LOG: [ARM][Block placement] Check the predecessor exists before processing it Not all machine loops will have a predecessor. so the pass needs to check it before continuing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D94780 Added: Modified: llvm/lib/Target/ARM/ARMBlockPlacement.cpp llvm/test/CodeGen/Thumb2/block-placement.mir Removed: diff --git a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp index 20491273ea5d4..581b4b9857af3 100644 --- a/llvm/lib/Target/ARM/ARMBlockPlacement.cpp +++ b/llvm/lib/Target/ARM/ARMBlockPlacement.cpp @@ -79,6 +79,8 @@ bool ARMBlockPlacement::runOnMachineFunction(MachineFunction &MF) { // LE branch then move the target block after the preheader. for (auto *ML : *MLI) { MachineBasicBlock *Preheader = ML->getLoopPredecessor(); +if (!Preheader) + continue; for (auto &Terminator : Preheader->terminators()) { if (Terminator.getOpcode() != ARM::t2WhileLoopStart) diff --git a/llvm/test/CodeGen/Thumb2/block-placement.mir b/llvm/test/CodeGen/Thumb2/block-placement.mir index d96a1fb49abbb..ed4a0a6b493d8 100644 --- a/llvm/test/CodeGen/Thumb2/block-placement.mir +++ b/llvm/test/CodeGen/Thumb2/block-placement.mir @@ -25,6 +25,16 @@ entry: unreachable } + + define void @no_preheader(i32 %N, i32 %M, i32* nocapture %a, i32* nocapture %b, i32* nocapture %c) local_unnamed_addr #0 { + entry: +unreachable + } + + declare dso_local i32 @g(...) local_unnamed_addr #1 + + declare dso_local i32 @h(...) local_unnamed_addr #1 + ... --- name:backwards_branch @@ -343,3 +353,91 @@ body: | t2B %bb.1, 14 /* CC::al */, $noreg ... +--- +name:no_preheader +body: | + ; CHECK-LABEL: name: no_preheader + ; CHECK: bb.0: + ; CHECK: successors: %bb.2(0x3000), %bb.1(0x5000) + ; CHECK: frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, $r7, killed $lr, implicit-def $sp, implicit $sp + ; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 16 + ; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4 + ; CHECK: frame-setup CFI_INSTRUCTION offset $r7, -8 + ; CHECK: frame-setup CFI_INSTRUCTION offset $r5, -12 + ; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -16 + ; CHECK: $r7 = frame-setup tADDrSPi $sp, 2, 14 /* CC::al */, $noreg + ; CHECK: frame-setup CFI_INSTRUCTION def_cfa $r7, 8 + ; CHECK: $r4 = tMOVr killed $r0, 14 /* CC::al */, $noreg + ; CHECK: tBL 14 /* CC::al */, $noreg, @g, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp, implicit-def $r0 + ; CHECK: tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr + ; CHECK: t2Bcc %bb.2, 0 /* CC::eq */, killed $cpsr + ; CHECK: bb.1: + ; CHECK: successors: %bb.4(0x8000) + ; CHECK: renamable $r0, dead $cpsr = tMOVi8 4, 14 /* CC::al */, $noreg + ; CHECK: renamable $r5 = t2LDRSHi12 killed renamable $r0, 0, 14 /* CC::al */, $noreg + ; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg + ; CHECK: bb.2: + ; CHECK: successors: %bb.4(0x8000) + ; CHECK: renamable $r5, dead $cpsr = tMOVi8 0, 14 /* CC::al */, $noreg + ; CHECK: t2B %bb.4, 14 /* CC::al */, $noreg + ; CHECK: bb.3: + ; CHECK: successors: %bb.4(0x8000) + ; CHECK: $r0 = tMOVr $r5, 14 /* CC::al */, $noreg + ; CHECK: tBL 14 /* CC::al */, $noreg, @h, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit killed $r0, implicit-def $sp, implicit-def dead $r0 + ; CHECK: bb.4: + ; CHECK: successors: %bb.5(0x0400), %bb.3(0x7c00) + ; CHECK: renamable $r0 = tLDRi renamable $r4, 0, 14 /* CC::al */, $noreg + ; CHECK: tCMPi8 killed renamable $r0, 0, 14 /* CC::al */, $noreg, implicit-def $cpsr + ; CHECK: t2Bcc %bb.3, 1 /* CC::ne */, killed $cpsr + ; CHECK: bb.5: + ; CHECK: frame-destroy tPOP_RET 14 /* CC::al */, $noreg, def $r4, def $r5, def $r7, def $pc + bb.0: +successors: %bb.1(0x3000), %bb.2(0x5000) +liveins: $r0, $r4, $r5, $lr + +frame-setup tPUSH 14 /* CC::al */, $noreg, killed $r4, killed $r5, $r7, killed $lr, implicit-def $sp, implicit $sp +frame-setup CFI_INSTRUCTION def_cfa_offset 16 +frame-setup CFI_INSTRUCTION offset $lr, -4 +frame-setup CFI_INSTRUCTION offset $r7, -8 +frame-setup CFI_INSTRUCTION offset $r5, -12 +frame-setup CFI_INSTRUCTION offset $r4, -16 +$r7 = frame-setup tADDrSPi $sp, 2, 14 /* CC::al */, $noreg +frame-setup CFI_INSTRUCTION def_cfa $r7, 8 +$r4 = tMOVr killed $r0, 14 /* CC::al */, $noreg +tBL 14 /* CC::al */, $nore
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
SamTebbs33 wrote: I've rebased this on top of my PR that adds an intrinsic since that's less fragile to match in the backend. So this should now be ready to have a look at. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disallow vscale x 1 partial reductions (PR #125252)
https://github.com/SamTebbs33 edited https://github.com/llvm/llvm-project/pull/125252 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disallow vscale x 1 partial reductions (PR #125252)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/125252 error: too big or took too long to generate ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64] Fix op mask detection in performZExtDeinterleaveShuffleCombine (#126054) (PR #126263)
https://github.com/SamTebbs33 approved this pull request. It makes sense to merge this as it fixes a micompilation. https://github.com/llvm/llvm-project/pull/126263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [AArch64] Fix op mask detection in performZExtDeinterleaveShuffleCombine (#126054) (PR #126263)
https://github.com/SamTebbs33 edited https://github.com/llvm/llvm-project/pull/126263 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5026,10 +5026,24 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -3177,6 +3177,420 @@ for.exit:; preds = %for.body ret i32 %add } +define dso_local void @dotp_high_register_pressure(ptr %a, ptr %b, ptr %sum, i32 %n) #1 { SamTebbs33 wrote: Added. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 created https://github.com/llvm/llvm-project/pull/133090 This PR accounts for scaled reductions in `calculateRegisterUsage` to reflect the fact that the number of lanes in their output is smaller than the VF. >From 6193c2c846710472c7e604ef33a15cda18771328 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 26 Mar 2025 14:01:59 + Subject: [PATCH] [LV] Reduce register usage for scaled reductions --- .../Transforms/Vectorize/LoopVectorize.cpp| 24 +- .../Transforms/Vectorize/VPRecipeBuilder.h| 3 +- llvm/lib/Transforms/Vectorize/VPlan.h | 14 +- .../partial-reduce-dot-product-neon.ll| 60 ++- .../AArch64/partial-reduce-dot-product.ll | 414 ++ 5 files changed, 495 insertions(+), 20 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index c9f314c0ba481..da701ef9ff1a2 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor()); +if (VF != VFs[J]) + LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J] +<< " to " << VF << " for "; + R->dump();); + for (VPValue *DefV : R->definedValues()) { Type *ScalarTy = TypeInfo.inferScalarType(DefV); unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy); - RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]); + RegUsage[ClassID] += GetRegUsage(ScalarTy, VF); } } } @@ -8963,8 +8976,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( if (isa(Instr) || isa(Instr)) return tryToWidenMemory(Instr, Operands, Range); - if (getScalingForReduction(Instr)) -return tryToCreatePartialReduction(Instr, Operands); + if (auto ScaleFactor = getScalingForReduction(Instr)) +return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()); if (!shouldWiden(Instr, Range)) return nullptr; @@ -8988,7 +9001,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( VPRecipeBase * VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, - ArrayRef Operands) { + ArrayRef Operands, + unsigned ScaleFactor) { assert(Operands.size() == 2 && "Unexpected number of operands for partial reduction"); @@ -9021,7 +9035,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc()); } return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator, - Reduction); + ScaleFactor, Reduction); } void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF, diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h index 334cfbad8bd7c..fd0064a34c4c9 100644 --- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h +++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h @@ -178,7 +178,8 @@ class VPRecipeBuilder { /// Create and return a partial reduction recipe for a reduction instruction /// along with binary operation and reduction phi operands. VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction, -ArrayRef Operands); +ArrayRef Operands, +unsigned ScaleFactor); /// Set the recipe created for given ingredient. void setRecipe(Instruction *I, VPRecipeBase *R) { diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 80b3d2a760293..d84efb1bd6850 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2001,6 +2001,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// Generate the phi/select nodes. void execute(VPTransformState &State) override; + unsigned getVFScaleFactor() const { return VFScaleFactor; } + #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) /// Print the recipe. void print(ra
[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)
@@ -7772,12 +7551,23 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() { InstructionCost Cost = cost(*P, VF); VectorizationFactor CurrentFactor(VF, Cost, ScalarCost); - if (isMoreProfitable(CurrentFactor, BestFactor)) -BestFactor = CurrentFactor; - // If profitable add it to ProfitableVF list. if (isMoreProfitable(CurrentFactor, ScalarFactor)) ProfitableVFs.push_back(CurrentFactor); SamTebbs33 wrote: Thanks for spotting that, done. https://github.com/llvm/llvm-project/pull/132190 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)
@@ -7759,7 +7535,10 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() { } for (auto &P : VPlans) { -for (ElementCount VF : P->vectorFactors()) { +SmallVector VFs(P->vectorFactors()); +auto RUs = ::calculateRegisterUsage(*P, VFs, TTI); +for (unsigned I = 0; I < VFs.size(); I++) { + auto VF = VFs[I]; SamTebbs33 wrote: Thanks for the suggestion, done. https://github.com/llvm/llvm-project/pull/132190 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [libcxx] [lldb] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,zcfh <1412805...@qq.com>,Alexey Bataev ,Florian Hahn ,Alexey Bataev ,Hristo Hristov ,Mircea Trofin ,Florian Hahn ,Jonas Devlieghere ,Henry Jiang ,Alexander Yermolovich <43973793+ayerm...@users.noreply.github.com>,Andy Kaylor ,Andy Kaylor ,Florian Hahn ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=, Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Andre Kuhlenschmidt ,Jan Svoboda ,Jorge Gorbe Moya ,Sumit Agarwal ,Andre Kuhlenschmidt ,Louis Dionne ,Jason Molenda ,modiking ,Ian Anderson ,Aditya Tejpaul <97700214+hoarfros...@users.noreply.github.com>,Alex MacLean ,Jorge Gorbe Moya ,Michael Jones ,Slava Zakharin ,Jerry-Ge ,Jerry-Ge ,Sudharsan Veeravalli ,Rodrigo Rocha ,Mircea Trofin ,Mircea Trofin ,NAKAMURA Takumi ,Fangrui Song ,Phoebe Wang ,Jacob Lalonde ,Kareem Ergawy ,cmtice ,Fangrui Song ,Iris <0...@owo.li>,Mats Jun Larsen ,Mats Jun Larsen ,Mariya Podchishchaeva ,Pavel Labath ,Vladi Krapp ,David Sherwood ,Igor Wodiany ,Adrian Kuegel ,Tobias Stadler ,Florian Hahn ,gbMattN ,Alaa Ali ,Durgadoss R ,Jerry-Ge ,Florian Hahn ,Vladi Krapp ,Sergio Afonso ,Paul Walker ,JaydeepChauhan14 ,Vy Nguyen ,Zahira Ammarguellat ,Baranov Victor ,Ilya Biryukov ,Ilya Biryukov ,Mariya Podchishchaeva ,Nashe Mncube ,Asher Mancinelli ,Matthias Springer ,Justin Bogner ,Aaron Ballman ,Ramkumar Ramachandra ,Nikita Popov ,Nashe Mncube ,Nikita Popov ,David Spickett ,Florian Hahn ,=?utf-8?q?Gaëtan?= Bossu ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Louis Dionne ,Craig Topper ,Evan Wilde ,Kevin Gleason ,Paschalis Mpeis ,Matthias Springer ,Snehasish Kumar ,Craig Topper ,Peter Klausler ,Felipe de Azevedo Piovezan ,Alexey Bataev ,Jan Svoboda ,Jan Svoboda ,Jan Svoboda ,Nico Weber ,Nico Weber ,Nico Weber ,Aaron Ballman ,Florian Mayer ,Luke Lau ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Alexey Bataev ,Ryosuke Niwa ,LLVM GN Syncbot ,LLVM GN Syncbot ,LLVM GN Syncbot ,Maksim Panchenko ,Sirraide ,Louis Dionne ,Aaron Ballman ,Ryosuke Niwa ,Lei Huang ,Zahira Ammarguellat ,erichkeane ,Craig Topper ,Jonas Devlieghere ,Michael Jones ,Lei Huang ,Florian Hahn ,Eugene Epshteyn ,Craig Topper ,Jonas Devlieghere ,Craig Topper ,Finn Plummer , Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Craig Topper ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Jorge Gorbe Moya ,Alex MacLean ,Jerry-Ge ,Ashley Coleman ,Austin Schuh ,Andy Kaylor ,Un1q32 ,Jonas Devlieghere ,alx32 <103613512+al...@users.noreply.github.com>,Slava Zakharin ,Michael Jones ,Andy Kaylor ,Felipe de Azevedo Piovezan ,weiwei chen ,Sudharsan Veeravalli ,Fangrui Song ,jobhdez ,Fangrui Song ,Fangrui Song ,Changpeng Fang ,Reid Kleckner ,Fangrui Song ,Aiden Grossman ,Aiden Grossman ,Vlad Serebrennikov ,Mats Jun Larsen ,David CARLIER ,David CARLIER ,Andreas Jonson ,Andreas Jonson ,Phoebe Wang ,Florian Hahn ,James E T Smith ,Matthias Springer ,Martin =?utf-8?q?Storsjö?= ,Antonio Frighetto ,Florian Hahn ,Rahul Joshi ,Fangrui Song ,Fangrui Song ,Louis Dionne ,Fangrui Song ,Fangrui Song ,Aiden Grossman ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Owen Pan ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Fangrui Song ,junfengd-nv ,Fangrui Song ,Fangrui Song =?utf-8?q?,?�ngrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Alan ,weiwei chen ,Owen Pan ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Florian Hahn ,Hui ,Sergei Barannikov ,Benjamin Kramer ,Florian Hahn ,Florian Hahn ,Benjamin Kramer ,Florian Hahn ,Florian Hahn ,Sergei Barannikov ,Fangrui Song ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Florian Hahn ,Mats Jun Larsen ,Phoebe Wang ,Matheus Izvekov ,Alan ,Baranov Victor ,Zhen Wang <37195552+wangz...@users.noreply.github.com>,Matheus Izvekov ,Matt Arsenault ,Matt Arsenault ,Tobias Gysi ,Fangrui Song ,Nikita Popov , =?utf-8?q?Balázs_Kéri?= ,Matt Arsenault ,Matt Arsenault ,Robert Imschweiler ,Matt Arsenault ,Matt Arsenault , Juan Manuel Martinez =?utf-8?q?Caamaño?=,Mike ,Luke Lau ,Simon Pilgrim ,Simon Pilgrim ,Simon Pilgrim ,Matt Arsenault ,Matthias Springer ,Han-Kuan Chen ,Abhishek Kaushik ,Anatoly Trosinenko , Andrzej =?utf-8?q?Warzyński?= ,Jack Frankland ,Anatoly Trosinenko ,Mel Chen ,Tom Eccles ,Aaron Ballman ,Jorn Tuyls ,Simon Pilgrim ,Jay Foad ,Zhaoxin Yang ,Uday Bondhugula ,Mats Jun Larsen ,Christian Sigg ,Jay Foad ,JaydeepChauhan14 ,Matthias Springer , Andrzej =?utf-8?q?Warzyński?= ,Krisztian Rugasi ,Nashe Mncube ,Farzon Lotfi ,Pedro Lobo ,Asher Mancinelli ,Farzon Lotfi ,Igor Wodiany ,Farzon Lotfi ,zhijian lin ,Matt Arsenault ,Justin Bogner ,Michael Klemm ,Matheus Izvekov ,Joseph Huber ,Julian Lettner ,Alexandre Ganea ,David Spickett ,Paul Kirth ,Rahul Joshi ,Farzon Lotfi ,Simon Pilgrim ,Linux User ,Ma
[llvm-branch-commits] [clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [libcxx] [lldb] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,zcfh <1412805...@qq.com>,Alexey Bataev ,Florian Hahn ,Alexey Bataev ,Hristo Hristov ,Mircea Trofin ,Florian Hahn ,Jonas Devlieghere ,Henry Jiang ,Alexander Yermolovich <43973793+ayerm...@users.noreply.github.com>,Andy Kaylor ,Andy Kaylor ,Florian Hahn ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=, Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Andre Kuhlenschmidt ,Jan Svoboda ,Jorge Gorbe Moya ,Sumit Agarwal ,Andre Kuhlenschmidt ,Louis Dionne ,Jason Molenda ,modiking ,Ian Anderson ,Aditya Tejpaul <97700214+hoarfros...@users.noreply.github.com>,Alex MacLean ,Jorge Gorbe Moya ,Michael Jones ,Slava Zakharin ,Jerry-Ge ,Jerry-Ge ,Sudharsan Veeravalli ,Rodrigo Rocha ,Mircea Trofin ,Mircea Trofin ,NAKAMURA Takumi ,Fangrui Song ,Phoebe Wang ,Jacob Lalonde ,Kareem Ergawy ,cmtice ,Fangrui Song ,Iris <0...@owo.li>,Mats Jun Larsen ,Mats Jun Larsen ,Mariya Podchishchaeva ,Pavel Labath ,Vladi Krapp ,David Sherwood ,Igor Wodiany ,Adrian Kuegel ,Tobias Stadler ,Florian Hahn ,gbMattN ,Alaa Ali ,Durgadoss R ,Jerry-Ge ,Florian Hahn ,Vladi Krapp ,Sergio Afonso ,Paul Walker ,JaydeepChauhan14 ,Vy Nguyen ,Zahira Ammarguellat ,Baranov Victor ,Ilya Biryukov ,Ilya Biryukov ,Mariya Podchishchaeva ,Nashe Mncube ,Asher Mancinelli ,Matthias Springer ,Justin Bogner ,Aaron Ballman ,Ramkumar Ramachandra ,Nikita Popov ,Nashe Mncube ,Nikita Popov ,David Spickett ,Florian Hahn ,=?utf-8?q?Gaëtan?= Bossu ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Peter Klausler ,Louis Dionne ,Craig Topper ,Evan Wilde ,Kevin Gleason ,Paschalis Mpeis ,Matthias Springer ,Snehasish Kumar ,Craig Topper ,Peter Klausler ,Felipe de Azevedo Piovezan ,Alexey Bataev ,Jan Svoboda ,Jan Svoboda ,Jan Svoboda ,Nico Weber ,Nico Weber ,Nico Weber ,Aaron Ballman ,Florian Mayer ,Luke Lau ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Alexey Bataev ,Ryosuke Niwa ,LLVM GN Syncbot ,LLVM GN Syncbot ,LLVM GN Syncbot ,Maksim Panchenko ,Sirraide ,Louis Dionne ,Aaron Ballman ,Ryosuke Niwa ,Lei Huang ,Zahira Ammarguellat ,erichkeane ,Craig Topper ,Jonas Devlieghere ,Michael Jones ,Lei Huang ,Florian Hahn ,Eugene Epshteyn ,Craig Topper ,Jonas Devlieghere ,Craig Topper ,Finn Plummer , Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Craig Topper ,Valentin Clement =?utf-8?b?KOODkOODrOODsw==?=,Jorge Gorbe Moya ,Alex MacLean ,Jerry-Ge ,Ashley Coleman ,Austin Schuh ,Andy Kaylor ,Un1q32 ,Jonas Devlieghere ,alx32 <103613512+al...@users.noreply.github.com>,Slava Zakharin ,Michael Jones ,Andy Kaylor ,Felipe de Azevedo Piovezan ,weiwei chen ,Sudharsan Veeravalli ,Fangrui Song ,jobhdez ,Fangrui Song ,Fangrui Song ,Changpeng Fang ,Reid Kleckner ,Fangrui Song ,Aiden Grossman ,Aiden Grossman ,Vlad Serebrennikov ,Mats Jun Larsen ,David CARLIER ,David CARLIER ,Andreas Jonson ,Andreas Jonson ,Phoebe Wang ,Florian Hahn ,James E T Smith ,Matthias Springer ,Martin =?utf-8?q?Storsjö?= ,Antonio Frighetto ,Florian Hahn ,Rahul Joshi ,Fangrui Song ,Fangrui Song ,Louis Dionne ,Fangrui Song ,Fangrui Song ,Aiden Grossman ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Owen Pan ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Fangrui Song ,junfengd-nv ,Fangrui Song ,Fangrui Song =?utf-8?q?,?�ngrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Alan ,weiwei chen ,Owen Pan ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Fangrui Song ,Florian Hahn ,Hui ,Sergei Barannikov ,Benjamin Kramer ,Florian Hahn ,Florian Hahn ,Benjamin Kramer ,Florian Hahn ,Florian Hahn ,Sergei Barannikov ,Fangrui Song ,Fangrui Song ,Florian Hahn ,Fangrui Song ,Florian Hahn ,Mats Jun Larsen ,Phoebe Wang ,Matheus Izvekov ,Alan ,Baranov Victor ,Zhen Wang <37195552+wangz...@users.noreply.github.com>,Matheus Izvekov ,Matt Arsenault ,Matt Arsenault ,Tobias Gysi ,Fangrui Song ,Nikita Popov , =?utf-8?q?Balázs_Kéri?= ,Matt Arsenault ,Matt Arsenault ,Robert Imschweiler ,Matt Arsenault ,Matt Arsenault , Juan Manuel Martinez =?utf-8?q?Caamaño?=,Mike ,Luke Lau ,Simon Pilgrim ,Simon Pilgrim ,Simon Pilgrim ,Matt Arsenault ,Matthias Springer ,Han-Kuan Chen ,Abhishek Kaushik ,Anatoly Trosinenko , Andrzej =?utf-8?q?Warzyński?= ,Jack Frankland ,Anatoly Trosinenko ,Mel Chen ,Tom Eccles ,Aaron Ballman ,Jorn Tuyls ,Simon Pilgrim ,Jay Foad ,Zhaoxin Yang ,Uday Bondhugula ,Mats Jun Larsen ,Christian Sigg ,Jay Foad ,JaydeepChauhan14 ,Matthias Springer , Andrzej =?utf-8?q?Warzyński?= ,Krisztian Rugasi ,Nashe Mncube ,Farzon Lotfi ,Pedro Lobo ,Asher Mancinelli ,Farzon Lotfi ,Igor Wodiany ,Farzon Lotfi ,zhijian lin ,Matt Arsenault ,Justin Bogner ,Michael Klemm ,Matheus Izvekov ,Joseph Huber ,Julian Lettner ,Alexandre Ganea ,David Spickett ,Paul Kirth ,Rahul Joshi ,Farzon Lotfi ,Simon Pilgrim ,Linux User ,Ma
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 edited https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)
https://github.com/SamTebbs33 edited https://github.com/llvm/llvm-project/pull/132190 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -253,38 +253,38 @@ define i64 @not_dotp_i8_to_i64_has_neon_dotprod(ptr readonly %a, ptr readonly %b ; CHECK-MAXBW-SAME: ptr readonly [[A:%.*]], ptr readonly [[B:%.*]]) #[[ATTR1:[0-9]+]] { ; CHECK-MAXBW-NEXT: entry: ; CHECK-MAXBW-NEXT:[[TMP0:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 8 +; CHECK-MAXBW-NEXT:[[TMP1:%.*]] = mul i64 [[TMP0]], 16 ; CHECK-MAXBW-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] ; CHECK-MAXBW: vector.ph: ; CHECK-MAXBW-NEXT:[[TMP2:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 8 +; CHECK-MAXBW-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 16 ; CHECK-MAXBW-NEXT:[[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]] ; CHECK-MAXBW-NEXT:[[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]] ; CHECK-MAXBW-NEXT:[[TMP4:%.*]] = call i64 @llvm.vscale.i64() -; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 8 +; CHECK-MAXBW-NEXT:[[TMP5:%.*]] = mul i64 [[TMP4]], 16 ; CHECK-MAXBW-NEXT:[[TMP6:%.*]] = getelementptr i8, ptr [[A]], i64 [[N_VEC]] ; CHECK-MAXBW-NEXT:[[TMP7:%.*]] = getelementptr i8, ptr [[B]], i64 [[N_VEC]] ; CHECK-MAXBW-NEXT:br label [[VECTOR_BODY:%.*]] ; CHECK-MAXBW: vector.body: ; CHECK-MAXBW-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] -; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP15:%.*]], [[VECTOR_BODY]] ] +; CHECK-MAXBW-NEXT:[[VEC_PHI:%.*]] = phi [ zeroinitializer, [[VECTOR_PH]] ], [ [[PARTIAL_REDUCE:%.*]], [[VECTOR_BODY]] ] ; CHECK-MAXBW-NEXT:[[TMP8:%.*]] = add i64 [[INDEX]], 0 ; CHECK-MAXBW-NEXT:[[NEXT_GEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP8]] ; CHECK-MAXBW-NEXT:[[TMP9:%.*]] = add i64 [[INDEX]], 0 ; CHECK-MAXBW-NEXT:[[NEXT_GEP1:%.*]] = getelementptr i8, ptr [[B]], i64 [[TMP9]] ; CHECK-MAXBW-NEXT:[[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i32 0 -; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr [[TMP10]], align 1 -; CHECK-MAXBW-NEXT:[[TMP11:%.*]] = zext [[WIDE_LOAD]] to +; CHECK-MAXBW-NEXT:[[WIDE_LOAD:%.*]] = load , ptr [[TMP10]], align 1 ; CHECK-MAXBW-NEXT:[[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i32 0 -; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr [[TMP12]], align 1 -; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext [[WIDE_LOAD2]] to -; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw [[TMP13]], [[TMP11]] -; CHECK-MAXBW-NEXT:[[TMP15]] = add [[TMP14]], [[VEC_PHI]] +; CHECK-MAXBW-NEXT:[[WIDE_LOAD2:%.*]] = load , ptr [[TMP12]], align 1 +; CHECK-MAXBW-NEXT:[[TMP15:%.*]] = zext [[WIDE_LOAD2]] to +; CHECK-MAXBW-NEXT:[[TMP13:%.*]] = zext [[WIDE_LOAD]] to +; CHECK-MAXBW-NEXT:[[TMP14:%.*]] = mul nuw nsw [[TMP15]], [[TMP13]] +; CHECK-MAXBW-NEXT:[[PARTIAL_REDUCE]] = call @llvm.experimental.vector.partial.reduce.add.nxv2i64.nxv16i64( [[VEC_PHI]], [[TMP14]]) SamTebbs33 wrote: Ah it looks like what was previously too high a cost for it to choose a 16i8 -> 2i64 partial reduction isn't sufficiently high now that the extend cost is hidden. I've made this permutation invalid. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2376,6 +2327,59 @@ class VPReductionRecipe : public VPRecipeWithIRFlags { } }; +/// A recipe for forming partial reductions. In the loop, an accumulator and +/// vector operand are added together and passed to the next iteration as the +/// next accumulator. After the loop body, the accumulator is reduced to a +/// scalar value. +class VPPartialReductionRecipe : public VPReductionRecipe { SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -219,6 +219,8 @@ class TargetTransformInfo { /// Get the kind of extension that an instruction represents. static PartialReductionExtendKind getPartialReductionExtendKind(Instruction *I); + static PartialReductionExtendKind + getPartialReductionExtendKind(Instruction::CastOps ExtOpcode); SamTebbs33 wrote: Using the `CastOps` one in the other is a good idea. Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and SamTebbs33 wrote: I don't think I could make it an NFC change, since to conform to `VPReductionRecipe`, the accumulator and binop have to be swapped around. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 created https://github.com/llvm/llvm-project/pull/136997 This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. >From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 23 Apr 2025 13:16:38 +0100 Subject: [PATCH] [LoopVectorizer] Bundle partial reductions with different extensions This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. --- llvm/lib/Transforms/Vectorize/VPlan.h | 42 +- .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++--- .../Transforms/Vectorize/VPlanTransforms.cpp | 25 - .../partial-reduce-dot-product-mixed.ll | 56 +-- .../LoopVectorize/AArch64/vplan-printing.ll | 29 +- 5 files changed, 99 insertions(+), 80 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 20d272e69e6e7..e11f608d068da 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. + Instruction::CastOps ExtOp0; + Instruction::CastOps ExtOp1; - /// Non-neg flag of the extend recipe. - bool IsNonNeg = false; + /// Non-neg flags of the extend recipe. + bool IsNonNeg0 = false; + bool IsNonNeg1 = false; Type *ResultTy; @@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), +ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()), +IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()), ResultTy(MulAcc->getResultType()), IsPartialReduction(MulAcc->isPartialReduction()) {} @@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), +ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()), +IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == @@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Instruction::CastOps::CastOpsEnd) { +ExtOp0(Instruction::CastOps::CastOpsEnd), +ExtOp1(Instruction::CastOps::CastOpsEnd) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateReductionRecipe must be " @@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + /// Return the opcode of the underlying extends. + Instruction::CastOps getExt0Opcode() const { return ExtOp0; } + Instruction::CastOps getExt1Opcode() const { return ExtOp1; } + + /// Return if the first extend's opcode is ZExt. + bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; } + + /// Return if the second extend's opcode is ZExt. + bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } + /// Return the non negative flag of the first ext recipe. + bool isNonNeg0() const { return IsNonNe
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); +} + +TargetTransformInfo::PartialReductionExtendKind +TargetTransformInfo::getPartialReductionExtendKind( +Instruction::CastOps ExtOpcode) { + switch (ExtOpcode) { + case Instruction::CastOps::ZExt: return PR_ZeroExtend; - return PR_None; + case Instruction::CastOps::SExt: +return PR_SignExtend; + default: +return PR_None; SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -986,11 +986,23 @@ InstructionCost TargetTransformInfo::getShuffleCost( TargetTransformInfo::PartialReductionExtendKind TargetTransformInfo::getPartialReductionExtendKind(Instruction *I) { - if (isa(I)) -return PR_SignExtend; - if (isa(I)) + auto *Cast = dyn_cast(I); + if (!Cast) +return PR_None; + return getPartialReductionExtendKind(Cast->getOpcode()); SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { SamTebbs33 wrote: At this point we've already created the partial reduction and clamped the range so I don't think we need to do any costing (like `tryToMatchAndCreateMulAccumulateReduction` does with `getMulAccReductionCost`) since we already know it's worthwhile (see `getScaledReductions` in LoopVectorize.cpp). This part of the code just puts the partial reduction inside the abstract recipe, which shouldn't need to consider any costing. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2432,12 +2437,40 @@ static void tryToCreateAbstractReductionRecipe(VPReductionRecipe *Red, Red->replaceAllUsesWith(AbstractR); } +/// This function tries to create an abstract recipe from a partial reduction to +/// hide its mul and extends from cost estimation. +static void +tryToCreateAbstractPartialReductionRecipe(VPPartialReductionRecipe *PRed) { + if (PRed->getOpcode() != Instruction::Add) +return; + + VPRecipeBase *BinOpR = PRed->getBinOp()->getDefiningRecipe(); + auto *BinOp = dyn_cast(BinOpR); + if (!BinOp || BinOp->getOpcode() != Instruction::Mul) +return; SamTebbs33 wrote: Done :+1: . https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -2056,55 +2056,6 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, } }; -/// A recipe for forming partial reductions. In the loop, an accumulator and SamTebbs33 wrote: I've pre-committed the NFC but rebasing Elvis's changes on top of that has been pretty challenging considering the number of commits on that branch. So I will cherry-pick the NFC on to this branch and it'll just go away once Elvis's PR lands and I rebase this PR on top of main. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
@@ -4923,9 +4923,7 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost( return Invalid; break; case 16: - if (AccumEVT == MVT::i64) -Cost *= 2; - else if (AccumEVT != MVT::i32) + if (AccumEVT != MVT::i32) SamTebbs33 wrote: Good spot. Done. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. SamTebbs33 wrote: I like that, thanks. Added. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2438,14 +2438,14 @@ VPMulAccumulateReductionRecipe::computeCost(ElementCount VF, return Ctx.TTI.getPartialReductionCost( Instruction::Add, Ctx.Types.inferScalarType(getVecOp0()), Ctx.Types.inferScalarType(getVecOp1()), getResultType(), VF, -TTI::getPartialReductionExtendKind(getExtOpcode()), -TTI::getPartialReductionExtendKind(getExtOpcode()), Instruction::Mul); +TTI::getPartialReductionExtendKind(getExt0Opcode()), +TTI::getPartialReductionExtendKind(getExt1Opcode()), Instruction::Mul); } Type *RedTy = Ctx.Types.inferScalarType(this); auto *SrcVecTy = cast(toVectorTy(Ctx.Types.inferScalarType(getVecOp0()), VF)); - return Ctx.TTI.getMulAccReductionCost(isZExt(), RedTy, SrcVecTy, + return Ctx.TTI.getMulAccReductionCost(isZExt0(), RedTy, SrcVecTy, SamTebbs33 wrote: I started off by modifying the TTI hook but found that it wasn't actually necessary since only partial reductions make use of the differing signedness and they don't use this hook. If someone is interested in getting mul-acc-reduce generated with different extensions then they can do the investigation needed for costing but I think it's outside the scope of this work. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/136997 >From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 23 Apr 2025 13:16:38 +0100 Subject: [PATCH 1/2] [LoopVectorizer] Bundle partial reductions with different extensions This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. --- llvm/lib/Transforms/Vectorize/VPlan.h | 42 +- .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++--- .../Transforms/Vectorize/VPlanTransforms.cpp | 25 - .../partial-reduce-dot-product-mixed.ll | 56 +-- .../LoopVectorize/AArch64/vplan-printing.ll | 29 +- 5 files changed, 99 insertions(+), 80 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 20d272e69e6e7..e11f608d068da 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. + Instruction::CastOps ExtOp0; + Instruction::CastOps ExtOp1; - /// Non-neg flag of the extend recipe. - bool IsNonNeg = false; + /// Non-neg flags of the extend recipe. + bool IsNonNeg0 = false; + bool IsNonNeg1 = false; Type *ResultTy; @@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), +ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()), +IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()), ResultTy(MulAcc->getResultType()), IsPartialReduction(MulAcc->isPartialReduction()) {} @@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), +ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()), +IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == @@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Instruction::CastOps::CastOpsEnd) { +ExtOp0(Instruction::CastOps::CastOpsEnd), +ExtOp1(Instruction::CastOps::CastOpsEnd) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateReductionRecipe must be " @@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + /// Return the opcode of the underlying extends. + Instruction::CastOps getExt0Opcode() const { return ExtOp0; } + Instruction::CastOps getExt1Opcode() const { return ExtOp1; } + + /// Return if the first extend's opcode is ZExt. + bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; } + + /// Return if the second extend's opcode is ZExt. + bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } + /// Return the non negative flag of the first ext recipe. + bool isNonNeg0() const { return IsNonNeg0; } - /// Return the non negative flag of the ext recipe. - bool isNonNeg() const { return IsNonNeg; } + /// Return the non negative flag of the second
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
SamTebbs33 wrote: Yeah that's the case :). Let me know if you have any issues applying it after applying 113903 too. https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -427,6 +428,29 @@ Value *VPInstruction::generate(VPTransformState &State) { {PredTy, ScalarTC->getType()}, {VIVElem0, ScalarTC}, nullptr, Name); } + // Count the number of bits set in each lane and reduce the result to a scalar + case VPInstruction::PopCount: { +Value *Op = State.get(getOperand(0)); +auto *VT = Op->getType(); SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -418,7 +418,13 @@ class LoopVectorizationPlanner { /// Build VPlans for the specified \p UserVF and \p UserIC if they are /// non-zero or all applicable candidate VFs otherwise. If vectorization and /// interleaving should be avoided up-front, no plans are generated. - void plan(ElementCount UserVF, unsigned UserIC); + /// RTChecks is a list of pointer pairs that should be checked for aliasing, + /// setting HasAliasMask to true in the case that an alias mask is generated SamTebbs33 wrote: Done, thanks. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/136997 >From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 23 Apr 2025 13:16:38 +0100 Subject: [PATCH 1/3] [LoopVectorizer] Bundle partial reductions with different extensions This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. --- llvm/lib/Transforms/Vectorize/VPlan.h | 42 +- .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++--- .../Transforms/Vectorize/VPlanTransforms.cpp | 25 - .../partial-reduce-dot-product-mixed.ll | 56 +-- .../LoopVectorize/AArch64/vplan-printing.ll | 29 +- 5 files changed, 99 insertions(+), 80 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 20d272e69e6e7..e11f608d068da 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. + Instruction::CastOps ExtOp0; + Instruction::CastOps ExtOp1; - /// Non-neg flag of the extend recipe. - bool IsNonNeg = false; + /// Non-neg flags of the extend recipe. + bool IsNonNeg0 = false; + bool IsNonNeg1 = false; Type *ResultTy; @@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), +ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()), +IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()), ResultTy(MulAcc->getResultType()), IsPartialReduction(MulAcc->isPartialReduction()) {} @@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), +ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()), +IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == @@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Instruction::CastOps::CastOpsEnd) { +ExtOp0(Instruction::CastOps::CastOpsEnd), +ExtOp1(Instruction::CastOps::CastOpsEnd) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateReductionRecipe must be " @@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + /// Return the opcode of the underlying extends. + Instruction::CastOps getExt0Opcode() const { return ExtOp0; } + Instruction::CastOps getExt1Opcode() const { return ExtOp1; } + + /// Return if the first extend's opcode is ZExt. + bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; } + + /// Return if the second extend's opcode is ZExt. + bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } + /// Return the non negative flag of the first ext recipe. + bool isNonNeg0() const { return IsNonNeg0; } - /// Return the non negative flag of the ext recipe. - bool isNonNeg() const { return IsNonNeg; } + /// Return the non negative flag of the second
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/133090 >From 6193c2c846710472c7e604ef33a15cda18771328 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 26 Mar 2025 14:01:59 + Subject: [PATCH 1/3] [LV] Reduce register usage for scaled reductions --- .../Transforms/Vectorize/LoopVectorize.cpp| 24 +- .../Transforms/Vectorize/VPRecipeBuilder.h| 3 +- llvm/lib/Transforms/Vectorize/VPlan.h | 14 +- .../partial-reduce-dot-product-neon.ll| 60 ++- .../AArch64/partial-reduce-dot-product.ll | 414 ++ 5 files changed, 495 insertions(+), 20 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index c9f314c0ba481..da701ef9ff1a2 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor()); +if (VF != VFs[J]) + LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J] +<< " to " << VF << " for "; + R->dump();); + for (VPValue *DefV : R->definedValues()) { Type *ScalarTy = TypeInfo.inferScalarType(DefV); unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy); - RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]); + RegUsage[ClassID] += GetRegUsage(ScalarTy, VF); } } } @@ -8963,8 +8976,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( if (isa(Instr) || isa(Instr)) return tryToWidenMemory(Instr, Operands, Range); - if (getScalingForReduction(Instr)) -return tryToCreatePartialReduction(Instr, Operands); + if (auto ScaleFactor = getScalingForReduction(Instr)) +return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()); if (!shouldWiden(Instr, Range)) return nullptr; @@ -8988,7 +9001,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( VPRecipeBase * VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, - ArrayRef Operands) { + ArrayRef Operands, + unsigned ScaleFactor) { assert(Operands.size() == 2 && "Unexpected number of operands for partial reduction"); @@ -9021,7 +9035,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc()); } return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator, - Reduction); + ScaleFactor, Reduction); } void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF, diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h index 334cfbad8bd7c..fd0064a34c4c9 100644 --- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h +++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h @@ -178,7 +178,8 @@ class VPRecipeBuilder { /// Create and return a partial reduction recipe for a reduction instruction /// along with binary operation and reduction phi operands. VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction, -ArrayRef Operands); +ArrayRef Operands, +unsigned ScaleFactor); /// Set the recipe created for given ingredient. void setRecipe(Instruction *I, VPRecipeBase *R) { diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 80b3d2a760293..d84efb1bd6850 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2001,6 +2001,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// Generate the phi/select nodes. void execute(VPTransformState &State) override; + unsigned getVFScaleFactor() const { return VFScaleFactor; } + #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) /// Print the recipe. void print(raw_ostream &O, const Twine &Indent, @@ -2031,17 +2033,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// scalar value. class VPPartialR
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor()); +if (VF != VFs[J]) SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 edited https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -2031,17 +2033,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// scalar value. class VPPartialReductionRecipe : public VPSingleDefRecipe { unsigned Opcode; + unsigned ScaleFactor; SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) SamTebbs33 wrote: Yeah that's a nice idea. We could add a `VPScaledRecipe` class. I agree with doing it afterwards. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/133090 >From d0a9e1c7e89abc5890d7303a2e22a9a56e2f022b Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 26 Mar 2025 14:01:59 + Subject: [PATCH 1/6] [LV] Reduce register usage for scaled reductions --- .../Transforms/Vectorize/LoopVectorize.cpp| 24 ++- .../Transforms/Vectorize/VPRecipeBuilder.h| 3 +- llvm/lib/Transforms/Vectorize/VPlan.h | 14 +- .../partial-reduce-dot-product-neon.ll| 116 .../AArch64/partial-reduce-dot-product.ll | 173 ++ .../LoopVectorize/AArch64/reg-usage.ll| 6 +- 6 files changed, 171 insertions(+), 165 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 2ebc7017f426a..486405991c612 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -5036,10 +5036,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor()); +if (VF != VFs[J]) + LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J] +<< " to " << VF << " for "; + R->dump();); + for (VPValue *DefV : R->definedValues()) { Type *ScalarTy = TypeInfo.inferScalarType(DefV); unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy); - RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]); + RegUsage[ClassID] += GetRegUsage(ScalarTy, VF); } } } @@ -9137,8 +9150,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( if (isa(Instr) || isa(Instr)) return tryToWidenMemory(Instr, Operands, Range); - if (getScalingForReduction(Instr)) -return tryToCreatePartialReduction(Instr, Operands); + if (auto ScaleFactor = getScalingForReduction(Instr)) +return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()); if (!shouldWiden(Instr, Range)) return nullptr; @@ -9162,7 +9175,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( VPRecipeBase * VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, - ArrayRef Operands) { + ArrayRef Operands, + unsigned ScaleFactor) { assert(Operands.size() == 2 && "Unexpected number of operands for partial reduction"); @@ -9195,7 +9209,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc()); } return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator, - Reduction); + ScaleFactor, Reduction); } void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF, diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h index 334cfbad8bd7c..fd0064a34c4c9 100644 --- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h +++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h @@ -178,7 +178,8 @@ class VPRecipeBuilder { /// Create and return a partial reduction recipe for a reduction instruction /// along with binary operation and reduction phi operands. VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction, -ArrayRef Operands); +ArrayRef Operands, +unsigned ScaleFactor); /// Set the recipe created for given ingredient. void setRecipe(Instruction *I, VPRecipeBase *R) { diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 37e0a176ab1cc..376526e804b4b 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2033,6 +2033,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// Generate the phi/select nodes. void execute(VPTransformState &State) override; + unsigned getVFScaleFactor() const { return VFScaleFactor; } + #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) /// Print the recipe. void print(raw_ostream &O, const Twine &Indent, @@ -2063,17 +2065,19 @@ class VPReductionPHIReci
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 commented: Apologies for the review requesting noise. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/133090 >From 9a9164fce2a7fe1d602fd24cf9a9026b06190f31 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 26 Mar 2025 14:01:59 + Subject: [PATCH 1/5] [LV] Reduce register usage for scaled reductions --- .../Transforms/Vectorize/LoopVectorize.cpp| 24 +- .../Transforms/Vectorize/VPRecipeBuilder.h| 3 +- llvm/lib/Transforms/Vectorize/VPlan.h | 14 +- .../partial-reduce-dot-product-neon.ll| 118 -- .../AArch64/partial-reduce-dot-product.ll | 344 +- 5 files changed, 282 insertions(+), 221 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 1dbcbdbe083fe..400a510be308b 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -5019,10 +5019,23 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor()); +if (VF != VFs[J]) + LLVM_DEBUG(dbgs() << "LV(REG): Scaled down VF from " << VFs[J] +<< " to " << VF << " for "; + R->dump();); + for (VPValue *DefV : R->definedValues()) { Type *ScalarTy = TypeInfo.inferScalarType(DefV); unsigned ClassID = TTI.getRegisterClassForType(true, ScalarTy); - RegUsage[ClassID] += GetRegUsage(ScalarTy, VFs[J]); + RegUsage[ClassID] += GetRegUsage(ScalarTy, VF); } } } @@ -8951,8 +8964,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( if (isa(Instr) || isa(Instr)) return tryToWidenMemory(Instr, Operands, Range); - if (getScalingForReduction(Instr)) -return tryToCreatePartialReduction(Instr, Operands); + if (auto ScaleFactor = getScalingForReduction(Instr)) +return tryToCreatePartialReduction(Instr, Operands, ScaleFactor.value()); if (!shouldWiden(Instr, Range)) return nullptr; @@ -8976,7 +8989,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe( VPRecipeBase * VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, - ArrayRef Operands) { + ArrayRef Operands, + unsigned ScaleFactor) { assert(Operands.size() == 2 && "Unexpected number of operands for partial reduction"); @@ -9009,7 +9023,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction, BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc()); } return new VPPartialReductionRecipe(ReductionOpcode, BinOp, Accumulator, - Reduction); + ScaleFactor, Reduction); } void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF, diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h index 334cfbad8bd7c..fd0064a34c4c9 100644 --- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h +++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h @@ -178,7 +178,8 @@ class VPRecipeBuilder { /// Create and return a partial reduction recipe for a reduction instruction /// along with binary operation and reduction phi operands. VPRecipeBase *tryToCreatePartialReduction(Instruction *Reduction, -ArrayRef Operands); +ArrayRef Operands, +unsigned ScaleFactor); /// Set the recipe created for given ingredient. void setRecipe(Instruction *I, VPRecipeBase *R) { diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 37e0a176ab1cc..376526e804b4b 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2033,6 +2033,8 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// Generate the phi/select nodes. void execute(VPTransformState &State) override; + unsigned getVFScaleFactor() const { return VFScaleFactor; } + #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) /// Print the recipe. void print(raw_ostream &O, const Twine &Indent, @@ -2063,17 +2065,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe, /// scalar value. class VPPart
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
SamTebbs33 wrote: Good idea, done. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)
@@ -5026,10 +5026,24 @@ calculateRegisterUsage(VPlan &Plan, ArrayRef VFs, // even in the scalar case. RegUsage[ClassID] += 1; } else { +// The output from scaled phis and scaled reductions actually have +// fewer lanes than the VF. +auto VF = VFs[J]; +if (auto *ReductionR = dyn_cast(R)) + VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor()); +else if (auto *PartialReductionR = + dyn_cast(R)) + VF = VF.divideCoefficientBy(PartialReductionR->getVFScaleFactor()); +LLVM_DEBUG(if (VF != VFs[J]) { SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/133090 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, } #endif +void VPAliasLaneMaskRecipe::execute(VPTransformState &State) { + IRBuilderBase Builder = State.Builder; + Value *SinkValue = State.get(getSinkValue(), true); + Value *SourceValue = State.get(getSourceValue(), true); + + auto *Type = SinkValue->getType(); + Value *AliasMask = Builder.CreateIntrinsic( + Intrinsic::experimental_get_alias_lane_mask, + {VectorType::get(Builder.getInt1Ty(), State.VF), Type, + Builder.getInt64Ty()}, + {SourceValue, SinkValue, Builder.getInt64(getAccessedElementSize()), + Builder.getInt1(WriteAfterRead)}, + nullptr, "alias.lane.mask"); + State.set(this, AliasMask, /*IsScalar=*/false); +} + +#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) +void VPAliasLaneMaskRecipe::print(raw_ostream &O, const Twine &Indent, + VPSlotTracker &SlotTracker) const { + O << Indent << "EMIT "; + getVPSingleValue()->printAsOperand(O, SlotTracker); + O << " = alias lane mask "; SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -195,6 +195,13 @@ enum class TailFoldingStyle { DataWithEVL, }; +enum class RTCheckStyle { + /// Branch to scalar loop if checks fails at runtime. + ScalarFallback, + /// Form a mask based on elements which won't be a WAR or RAW hazard SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3235,6 +3263,36 @@ void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, } #endif +void VPAliasLaneMaskRecipe::execute(VPTransformState &State) { + IRBuilderBase Builder = State.Builder; + Value *SinkValue = State.get(getSinkValue(), true); + Value *SourceValue = State.get(getSourceValue(), true); + + auto *Type = SinkValue->getType(); SamTebbs33 wrote: Not needed thanks to rebase. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3073,6 +3075,56 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe { } }; +// Given a pointer A that is being stored to, and pointer B that is being +// read from, both with unknown lengths, create a mask that disables +// elements which could overlap across a loop iteration. For example, if A +// is X and B is X + 2 with VF being 4, only the final two elements of the +// loaded vector can be stored since they don't overlap with the stored +// vector. %b.vec = load %b ; = [s, t, u, v] +// [...] +// store %a, %b.vec ; only u and v can be stored as their addresses don't +// overlap with %a + (VF - 1) SamTebbs33 wrote: Yes you're right, this should say that the *first* two are valid. Thanks for spotting that. I've re-worded the comment to make it more clear. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1416,14 +1466,14 @@ void VPlanTransforms::addActiveLaneMask( auto *FoundWidenCanonicalIVUser = find_if(Plan.getCanonicalIV()->users(), [](VPUser *U) { return isa(U); }); - assert(FoundWidenCanonicalIVUser && + assert(FoundWidenCanonicalIVUser && *FoundWidenCanonicalIVUser && SamTebbs33 wrote: Done, thanks. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1300,14 +1301,38 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch( cast(CanonicalIVPHI->getBackedgeValue()); // TODO: Check if dropping the flags is needed if // !DataAndControlFlowWithoutRuntimeCheck. + VPValue *IncVal = CanonicalIVIncrement->getOperand(1); + assert(IncVal != CanonicalIVPHI && "Unexpected operand order"); + CanonicalIVIncrement->dropPoisonGeneratingFlags(); DebugLoc DL = CanonicalIVIncrement->getDebugLoc(); + // We can't use StartV directly in the ActiveLaneMask VPInstruction, since // we have to take unrolling into account. Each part needs to start at // Part * VF auto *VecPreheader = Plan.getVectorPreheader(); VPBuilder Builder(VecPreheader); + // Create an alias mask for each possibly-aliasing pointer pair. If there + // are multiple they are combined together with ANDs. + VPValue *AliasMask = nullptr; + + for (auto C : RTChecks) { +// FIXME: How to pass this info back? +//HasAliasMask = true; SamTebbs33 wrote: The info is acutally being passed back so I can remove this FIXME. Done. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -3073,6 +3075,56 @@ struct VPWidenStoreEVLRecipe final : public VPWidenMemoryRecipe { } }; +// Given a pointer A that is being stored to, and pointer B that is being +// read from, both with unknown lengths, create a mask that disables +// elements which could overlap across a loop iteration. For example, if A +// is X and B is X + 2 with VF being 4, only the final two elements of the +// loaded vector can be stored since they don't overlap with the stored +// vector. %b.vec = load %b ; = [s, t, u, v] +// [...] +// store %a, %b.vec ; only u and v can be stored as their addresses don't +// overlap with %a + (VF - 1) +class VPAliasLaneMaskRecipe : public VPSingleDefRecipe { + +public: + VPAliasLaneMaskRecipe(VPValue *Src, VPValue *Sink, unsigned ElementSize, +bool WriteAfterRead) + : VPSingleDefRecipe(VPDef::VPAliasLaneMaskSC, {Src, Sink}), +ElementSize(ElementSize), WriteAfterRead(WriteAfterRead) {} + + ~VPAliasLaneMaskRecipe() override = default; + + VPAliasLaneMaskRecipe *clone() override { +return new VPAliasLaneMaskRecipe(getSourceValue(), getSinkValue(), + ElementSize, WriteAfterRead); + } + + VP_CLASSOF_IMPL(VPDef::VPAliasLaneMaskSC); + + void execute(VPTransformState &State) override; + + /// Get the VPValue* for the pointer being read from + VPValue *getSourceValue() const { return getOperand(0); } + + // Get the size of the element(s) accessed by the pointers + unsigned getAccessedElementSize() const { return ElementSize; } + + /// Get the VPValue* for the pointer being stored to + VPValue *getSinkValue() const { return getOperand(1); } + + bool isWriteAfterRead() const { return WriteAfterRead; } + +private: + unsigned ElementSize; + bool WriteAfterRead; + +#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) + /// Print the recipe. + void print(raw_ostream &O, const Twine &Indent, SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -77,9 +77,13 @@ struct VPlanTransforms { /// creation) and instead it is handled using active-lane-mask. \p /// DataAndControlFlowWithoutRuntimeCheck implies \p /// UseActiveLaneMaskForControlFlow. + /// RTChecks refers to the pointer pairs that need aliasing elements to be + /// masked off each loop iteration. SamTebbs33 wrote: Added, let me know if anything about it should change. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Mask off possibly aliasing vector lanes (PR #100579)
@@ -1331,14 +1356,37 @@ static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch( "index.part.next"); // Create the active lane mask instruction in the VPlan preheader. - auto *EntryALM = + VPValue *Mask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {EntryIncrement, TC}, DL, "active.lane.mask.entry"); // Now create the ActiveLaneMaskPhi recipe in the main loop using the // preheader ActiveLaneMask instruction. - auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(EntryALM, DebugLoc()); + auto *LaneMaskPhi = new VPActiveLaneMaskPHIRecipe(Mask, DebugLoc()); LaneMaskPhi->insertAfter(CanonicalIVPHI); + VPValue *LaneMask = LaneMaskPhi; + if (AliasMask) { +// Increment phi by correct amount. +Builder.setInsertPoint(CanonicalIVIncrement); + +VPValue *IncrementBy = Builder.createNaryOp(VPInstruction::PopCount, +{AliasMask}, DL, "popcount"); +Type *IVType = CanonicalIVPHI->getScalarType(); + +if (IVType->getScalarSizeInBits() < 64) { + auto *Cast = + new VPScalarCastRecipe(Instruction::Trunc, IncrementBy, IVType); + Cast->insertAfter(IncrementBy->getDefiningRecipe()); + IncrementBy = Cast; +} +CanonicalIVIncrement->setOperand(1, IncrementBy); + +// And the alias mask so the iteration only processes non-aliasing lanes +Builder.setInsertPoint(CanonicalIVPHI->getParent(), + CanonicalIVPHI->getParent()->getFirstNonPhi()); +LaneMask = Builder.createNaryOp(Instruction::BinaryOps::And, +{LaneMaskPhi, AliasMask}, DL); SamTebbs33 wrote: We don't, and there's actually a case in the test suite that hangs because the mask is all-false. I'll start looking into a solution for that. https://github.com/llvm/llvm-project/pull/100579 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/136997 >From 10c4727074a7f5b4502ad08dc655be8fa5ffa3d2 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Wed, 23 Apr 2025 13:16:38 +0100 Subject: [PATCH 1/5] [LoopVectorizer] Bundle partial reductions with different extensions This PR adds support for extensions of different signedness to VPMulAccumulateReductionRecipe and allows such partial reductions to be bundled into that class. --- llvm/lib/Transforms/Vectorize/VPlan.h | 42 +- .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 27 ++--- .../Transforms/Vectorize/VPlanTransforms.cpp | 25 - .../partial-reduce-dot-product-mixed.ll | 56 +-- .../LoopVectorize/AArch64/vplan-printing.ll | 29 +- 5 files changed, 99 insertions(+), 80 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h index 20d272e69e6e7..e11f608d068da 100644 --- a/llvm/lib/Transforms/Vectorize/VPlan.h +++ b/llvm/lib/Transforms/Vectorize/VPlan.h @@ -2493,11 +2493,13 @@ class VPExtendedReductionRecipe : public VPReductionRecipe { /// recipe is abstract and needs to be lowered to concrete recipes before /// codegen. The Operands are {ChainOp, VecOp1, VecOp2, [Condition]}. class VPMulAccumulateReductionRecipe : public VPReductionRecipe { - /// Opcode of the extend recipe. - Instruction::CastOps ExtOp; + /// Opcodes of the extend recipes. + Instruction::CastOps ExtOp0; + Instruction::CastOps ExtOp1; - /// Non-neg flag of the extend recipe. - bool IsNonNeg = false; + /// Non-neg flags of the extend recipe. + bool IsNonNeg0 = false; + bool IsNonNeg1 = false; Type *ResultTy; @@ -2512,7 +2514,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { MulAcc->getCondOp(), MulAcc->isOrdered(), WrapFlagsTy(MulAcc->hasNoUnsignedWrap(), MulAcc->hasNoSignedWrap()), MulAcc->getDebugLoc()), -ExtOp(MulAcc->getExtOpcode()), IsNonNeg(MulAcc->isNonNeg()), +ExtOp0(MulAcc->getExt0Opcode()), ExtOp1(MulAcc->getExt1Opcode()), +IsNonNeg0(MulAcc->isNonNeg0()), IsNonNeg1(MulAcc->isNonNeg1()), ResultTy(MulAcc->getResultType()), IsPartialReduction(MulAcc->isPartialReduction()) {} @@ -2526,7 +2529,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Ext0->getOpcode()), IsNonNeg(Ext0->isNonNeg()), +ExtOp0(Ext0->getOpcode()), ExtOp1(Ext1->getOpcode()), +IsNonNeg0(Ext0->isNonNeg()), IsNonNeg1(Ext1->isNonNeg()), ResultTy(ResultTy), IsPartialReduction(isa(R)) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == @@ -2542,7 +2546,8 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { R->getCondOp(), R->isOrdered(), WrapFlagsTy(Mul->hasNoUnsignedWrap(), Mul->hasNoSignedWrap()), R->getDebugLoc()), -ExtOp(Instruction::CastOps::CastOpsEnd) { +ExtOp0(Instruction::CastOps::CastOpsEnd), +ExtOp1(Instruction::CastOps::CastOpsEnd) { assert(RecurrenceDescriptor::getOpcode(getRecurrenceKind()) == Instruction::Add && "The reduction instruction in MulAccumulateReductionRecipe must be " @@ -2586,19 +2591,26 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { return ExtOp0 != Instruction::CastOps::CastOpsEnd; } /// Return if the operands of mul instruction come from same extend. - bool isSameExtend() const { return getVecOp0() == getVecOp1(); } + bool isSameExtendVal() const { return getVecOp0() == getVecOp1(); } - /// Return the opcode of the underlying extend. - Instruction::CastOps getExtOpcode() const { return ExtOp; } + /// Return the opcode of the underlying extends. + Instruction::CastOps getExt0Opcode() const { return ExtOp0; } + Instruction::CastOps getExt1Opcode() const { return ExtOp1; } + + /// Return if the first extend's opcode is ZExt. + bool isZExt0() const { return ExtOp0 == Instruction::CastOps::ZExt; } + + /// Return if the second extend's opcode is ZExt. + bool isZExt1() const { return ExtOp1 == Instruction::CastOps::ZExt; } - /// Return if the extend opcode is ZExt. - bool isZExt() const { return ExtOp == Instruction::CastOps::ZExt; } + /// Return the non negative flag of the first ext recipe. + bool isNonNeg0() const { return IsNonNeg0; } - /// Return the non negative flag of the ext recipe. - bool isNonNeg() const { return IsNonNeg; } + /// Return the non negative flag of the second
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
@@ -2586,22 +2590,21 @@ class VPMulAccumulateReductionRecipe : public VPReductionRecipe { VPValue *getVecOp1() const { return getOperand(2); } /// Return if this MulAcc recipe contains extend instructions. - bool isExtended() const { return ExtOp != Instruction::CastOps::CastOpsEnd; } + bool isExtended() const { +return getVecOp0Info().ExtOp != Instruction::CastOps::CastOpsEnd; SamTebbs33 wrote: That can't happen at the moment, but I think you're right and it's worth considering the other extension as well. Done. https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions inside VPMulAccumulateReductionRecipe (PR #136173)
SamTebbs33 wrote: Ping :) https://github.com/llvm/llvm-project/pull/136173 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
SamTebbs33 wrote: Superseded by https://github.com/llvm/llvm-project/pull/144908 https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
https://github.com/SamTebbs33 closed https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Bundle partial reductions with different extensions (PR #136997)
SamTebbs33 wrote: Really sorry for the spam again, I pushed to the user branch in my fork rather than the base branch in llvm :facepalm: https://github.com/llvm/llvm-project/pull/136997 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)
https://github.com/SamTebbs33 closed https://github.com/llvm/llvm-project/pull/146073 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)
SamTebbs33 wrote: Closed in favour of a PR based on top of https://github.com/llvm/llvm-project/pull/147302 https://github.com/llvm/llvm-project/pull/146073 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use VPReductionRecipe for partial reductions (PR #146073)
@@ -2744,6 +2702,12 @@ class VPSingleDefBundleRecipe : public VPSingleDefRecipe { /// vector operands, performing a reduction.add on the result, and adding /// the scalar result to a chain. MulAccumulateReduction, +/// Represent an inloop multiply-accumulate reduction, multiplying the +/// extended vector operands, negating the multiplication, performing a +/// reduction.add +/// on the result, and adding +/// the scalar result to a chain. +ExtNegatedMulAccumulateReduction, SamTebbs33 wrote: Thanks Florian, that sounds like a good approach. https://github.com/llvm/llvm-project/pull/146073 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
https://github.com/SamTebbs33 created https://github.com/llvm/llvm-project/pull/147255 This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. >From 1a5f4e42e4f9d1eae0222302dcabdf08492f67c3 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Mon, 30 Jun 2025 14:29:54 +0100 Subject: [PATCH] [LV] Bundle sub reductions into VPExpressionRecipe This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. --- .../llvm/Analysis/TargetTransformInfo.h | 4 +- .../llvm/Analysis/TargetTransformInfoImpl.h | 2 +- llvm/include/llvm/CodeGen/BasicTTIImpl.h | 3 + llvm/lib/Analysis/TargetTransformInfo.cpp | 5 +- .../AArch64/AArch64TargetTransformInfo.cpp| 7 +- .../AArch64/AArch64TargetTransformInfo.h | 2 +- .../lib/Target/ARM/ARMTargetTransformInfo.cpp | 7 +- llvm/lib/Target/ARM/ARMTargetTransformInfo.h | 1 + .../Transforms/Vectorize/LoopVectorize.cpp| 6 +- llvm/lib/Transforms/Vectorize/VPlan.h | 11 ++ .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 35 - .../Transforms/Vectorize/VPlanTransforms.cpp | 33 ++-- .../Transforms/Vectorize/VectorCombine.cpp| 4 +- .../vplan-printing-reductions.ll | 143 ++ 14 files changed, 236 insertions(+), 27 deletions(-) diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index c43870392361d..3cc0ea01953c3 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -1645,8 +1645,10 @@ class TargetTransformInfo { /// extensions. This is the cost of as: /// ResTy vecreduce.add(mul (A, B)). /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)). + /// The multiply can optionally be negated, which signifies that it is a sub + /// reduction. LLVM_ABI InstructionCost getMulAccReductionCost( - bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const; /// Calculate the cost of an extended reduction pattern, similar to diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 12f87226c5f57..fd22981a5dbf3 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -960,7 +960,7 @@ class TargetTransformInfoImplBase { virtual InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, - TTI::TargetCostKind CostKind) const { + bool Negated, TTI::TargetCostKind CostKind) const { return 1; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index bf958e100f2ac..a9c9fa6d1db0d 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool Negated, TTI::TargetCostKind CostKind) const override { +if (Negated) + return InstructionCost::getInvalid(CostKind); // Without any native support, this is equivalent to the cost of // vecreduce.add(mul(ext(Ty A), ext(Ty B))) or // vecreduce.add(mul(A, B)). diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 3ebd9d487ba04..ba0d070bffe6d 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -1274,9 +1274,10 @@ InstructionCost TargetTransformInfo::getExtendedReductionCost( } InstructionCost TargetTransformInfo::getMulAccReductionCost( -bool IsUnsigned, Type *ResTy, VectorType *Ty, +bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, TTI::TargetCostKind CostKind) const { - return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, CostKind); + return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, Negated, + CostKind); } InstructionCost diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index 380faa6cf6939..d9a367535baf4 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -5316,8 +5316,10 @@ InstructionCost AArch64TTIImpl::getExtendedReductionCost( InstructionCost AArch64TTIImpl::getMulAccReductionCost(bool IsUnsigned, Type *ResTy, - VectorType *VecTy, + VectorType *VecTy, bo
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
https://github.com/SamTebbs33 updated https://github.com/llvm/llvm-project/pull/147255 >From 1a5f4e42e4f9d1eae0222302dcabdf08492f67c3 Mon Sep 17 00:00:00 2001 From: Samuel Tebbs Date: Mon, 30 Jun 2025 14:29:54 +0100 Subject: [PATCH 1/2] [LV] Bundle sub reductions into VPExpressionRecipe This PR bundles sub reductions into the VPExpressionRecipe class and adjusts the cost functions to take the negation into account. --- .../llvm/Analysis/TargetTransformInfo.h | 4 +- .../llvm/Analysis/TargetTransformInfoImpl.h | 2 +- llvm/include/llvm/CodeGen/BasicTTIImpl.h | 3 + llvm/lib/Analysis/TargetTransformInfo.cpp | 5 +- .../AArch64/AArch64TargetTransformInfo.cpp| 7 +- .../AArch64/AArch64TargetTransformInfo.h | 2 +- .../lib/Target/ARM/ARMTargetTransformInfo.cpp | 7 +- llvm/lib/Target/ARM/ARMTargetTransformInfo.h | 1 + .../Transforms/Vectorize/LoopVectorize.cpp| 6 +- llvm/lib/Transforms/Vectorize/VPlan.h | 11 ++ .../lib/Transforms/Vectorize/VPlanRecipes.cpp | 35 - .../Transforms/Vectorize/VPlanTransforms.cpp | 33 ++-- .../Transforms/Vectorize/VectorCombine.cpp| 4 +- .../vplan-printing-reductions.ll | 143 ++ 14 files changed, 236 insertions(+), 27 deletions(-) diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h index c43870392361d..3cc0ea01953c3 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfo.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h @@ -1645,8 +1645,10 @@ class TargetTransformInfo { /// extensions. This is the cost of as: /// ResTy vecreduce.add(mul (A, B)). /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)). + /// The multiply can optionally be negated, which signifies that it is a sub + /// reduction. LLVM_ABI InstructionCost getMulAccReductionCost( - bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const; /// Calculate the cost of an extended reduction pattern, similar to diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h index 12f87226c5f57..fd22981a5dbf3 100644 --- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h +++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h @@ -960,7 +960,7 @@ class TargetTransformInfoImplBase { virtual InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, - TTI::TargetCostKind CostKind) const { + bool Negated, TTI::TargetCostKind CostKind) const { return 1; } diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h index bf958e100f2ac..a9c9fa6d1db0d 100644 --- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h +++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h @@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool Negated, TTI::TargetCostKind CostKind) const override { +if (Negated) + return InstructionCost::getInvalid(CostKind); // Without any native support, this is equivalent to the cost of // vecreduce.add(mul(ext(Ty A), ext(Ty B))) or // vecreduce.add(mul(A, B)). diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp index 3ebd9d487ba04..ba0d070bffe6d 100644 --- a/llvm/lib/Analysis/TargetTransformInfo.cpp +++ b/llvm/lib/Analysis/TargetTransformInfo.cpp @@ -1274,9 +1274,10 @@ InstructionCost TargetTransformInfo::getExtendedReductionCost( } InstructionCost TargetTransformInfo::getMulAccReductionCost( -bool IsUnsigned, Type *ResTy, VectorType *Ty, +bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, TTI::TargetCostKind CostKind) const { - return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, CostKind); + return TTIImpl->getMulAccReductionCost(IsUnsigned, ResTy, Ty, Negated, + CostKind); } InstructionCost diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index 380faa6cf6939..d9a367535baf4 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -5316,8 +5316,10 @@ InstructionCost AArch64TTIImpl::getExtendedReductionCost( InstructionCost AArch64TTIImpl::getMulAccReductionCost(bool IsUnsigned, Type *ResTy, - VectorType *VecTy, + VectorType *VecTy, bool Negated, TTI::TargetCostKind CostKind) const { + if (Negated) +return Instruction
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -2725,6 +2729,31 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent, O << ")"; break; } + case ExpressionTypes::ExtNegatedMulAccReduction: { SamTebbs33 wrote: That was my initial approach but it required checking the number of operands to know if there was a sub or not, and I was asked to create an expression type to not rely on operand ordering being stable. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -1645,8 +1645,10 @@ class TargetTransformInfo { /// extensions. This is the cost of as: /// ResTy vecreduce.add(mul (A, B)). /// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)). + /// The multiply can optionally be negated, which signifies that it is a sub + /// reduction. LLVM_ABI InstructionCost getMulAccReductionCost( - bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool IsUnsigned, Type *ResTy, VectorType *Ty, bool Negated, SamTebbs33 wrote: Good idea, done. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -5538,7 +5538,7 @@ LoopVectorizationCostModel::getReductionPatternCost(Instruction *I, TTI::CastContextHint::None, CostKind, RedOp); InstructionCost RedCost = TTI.getMulAccReductionCost( -IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, CostKind); +IsUnsigned, RdxDesc.getRecurrenceType(), ExtType, false, CostKind); SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -3116,7 +3116,10 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase { InstructionCost getMulAccReductionCost(bool IsUnsigned, Type *ResTy, VectorType *Ty, + bool Negated, TTI::TargetCostKind CostKind) const override { +if (Negated) SamTebbs33 wrote: Thanks, I've added a cost for the sub. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -2757,6 +2757,12 @@ class VPExpressionRecipe : public VPSingleDefRecipe { /// vector operands, performing a reduction.add on the result, and adding /// the scalar result to a chain. MulAccReduction, +/// Represent an inloop multiply-accumulate reduction, multiplying the +/// extended vector operands, negating the multiplication, performing a +/// reduction.add +/// on the result, and adding SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Bundle sub reductions into VPExpressionRecipe (PR #147255)
@@ -1401,8 +1401,8 @@ static void analyzeCostOfVecReduction(const IntrinsicInst &II, TTI::CastContextHint::None, CostKind, RedOp); CostBeforeReduction = ExtCost * 2 + MulCost + Ext2Cost; -CostAfterReduction = -TTI.getMulAccReductionCost(IsUnsigned, II.getType(), ExtType, CostKind); +CostAfterReduction = TTI.getMulAccReductionCost(IsUnsigned, II.getType(), +ExtType, false, CostKind); SamTebbs33 wrote: Done. https://github.com/llvm/llvm-project/pull/147255 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits