[llvm-branch-commits] [AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN, SMAX, UMIN, UMAX} for odd-sized vectors (PR #81831)
https://github.com/dc03-work created https://github.com/llvm/llvm-project/pull/81831 i8 vectors do not have their sizes changed as I noticed regressions in some tests when that was done. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN, SMAX, UMIN, UMAX} for odd-sized vectors (PR #81831)
dc03-work wrote: This PR is stacked on top of https://github.com/llvm/llvm-project/pull/81830. Sorry for the long branch names on both, I do not know how to change the default branch names with SPR https://github.com/llvm/llvm-project/pull/81831 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN, SMAX, UMIN, UMAX} for odd-sized vectors (PR #81831)
@@ -1070,6 +1070,13 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) {s16, v8s16}, {s32, v2s32}, {s32, v4s32}}) + .moreElementsIf( dc03-work wrote: As I noted in my commit message, unfortunately that causes regressions for odd-sized `i8` vectors: https://gist.github.com/dc03-work/3d749a7be0dc893d86d2df0fbc31709a (except for the very last case... for some reason). I was noticing another test failure when I enabled it for even-sized vectors, however that seems to have gone away now. https://github.com/llvm/llvm-project/pull/81831 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN, SMAX, UMIN, UMAX} for odd-sized vectors (PR #81831)
https://github.com/dc03-work closed https://github.com/llvm/llvm-project/pull/81831 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN, SMAX, UMIN, UMAX} for odd-sized vectors (PR #81831)
dc03-work wrote: Merged with #81830 in #82740 https://github.com/llvm/llvm-project/pull/81831 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85037)
https://github.com/dc03-work closed https://github.com/llvm/llvm-project/pull/85037 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85037)
dc03-work wrote: Looks like I messed up with spr... https://github.com/llvm/llvm-project/pull/85037 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85039)
https://github.com/dc03-work created https://github.com/llvm/llvm-project/pull/85039 This patch adds custom legalization for G_LOAD where it splits loads of fixed-width vector types larger than 128 bits into loads of 128-bit vectors with the same element type. This is an improvement to what was being done before where loads would be split into individual loads for each element of the vector. ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85039)
dc03-work wrote: This patch is meant to be stacked on https://github.com/llvm/llvm-project/pull/85038. It looks like spr did something weird here... https://github.com/llvm/llvm-project/pull/85039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85039)
https://github.com/dc03-work edited https://github.com/llvm/llvm-project/pull/85039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
https://github.com/dc03-work created https://github.com/llvm/llvm-project/pull/85042 This patch adds custom legalization for G_LOAD where it splits loads of fixed-width vector types larger than 128 bits into loads of 128-bit vectors with the same element type. This is an improvement to what was being done before where loads would be split into individual loads for each element of the vector. >From 266db91b2479047e1c264fce1f527282edd3f17f Mon Sep 17 00:00:00 2001 From: Dhruv Chawla Date: Wed, 13 Mar 2024 10:36:35 +0530 Subject: [PATCH] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads This patch adds custom legalization for G_LOAD where it splits loads of fixed-width vector types larger than 128 bits into loads of 128-bit vectors with the same element type. This is an improvement to what was being done before where loads would be split into individual loads for each element of the vector. --- .../AArch64/GISel/AArch64LegalizerInfo.cpp| 70 + .../GlobalISel/legalize-load-store.mir| 41 +- llvm/test/CodeGen/AArch64/vecreduce-add.ll| 1476 +++-- 3 files changed, 290 insertions(+), 1297 deletions(-) diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index 36adada2796531..fc1063b6bd4893 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -356,6 +356,12 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) return Query.Types[0] == s128 && Query.MMODescrs[0].Ordering != AtomicOrdering::NotAtomic; }) + .customIf([=](const LegalityQuery &Query) { +// We need custom legalization for loads greater than 128-bits as they +// need to be split up into chunks. +return Query.Types[0].isFixedVector() && + Query.Types[0].getSizeInBits() > 128; + }) .legalForTypesWithMemDesc({{s8, p0, s8, 8}, {s16, p0, s16, 8}, {s32, p0, s32, 8}, @@ -1632,6 +1638,70 @@ bool AArch64LegalizerInfo::legalizeLoadStore( Register ValReg = MI.getOperand(0).getReg(); const LLT ValTy = MRI.getType(ValReg); + if (ValTy.isFixedVector() && ValTy.getSizeInBits() > 128) { +// Break fixed-width vector loads of sizes greater than 128 bits into chunks +// of 128-bit vector loads with the same element type. +Register LoadReg = MI.getOperand(1).getReg(); +Register LoadRegWithOffset = LoadReg; + +unsigned EltSize = ValTy.getScalarSizeInBits(); +// Only support element types which can cleanly divide into 128-bit wide +// vectors. +if (128 % EltSize != 0) + return false; + +unsigned NewEltCount = 128 / EltSize; +LLT NewTy = LLT::fixed_vector(NewEltCount, ValTy.getElementType()); + +unsigned OldEltCount = ValTy.getNumElements(); +unsigned NumVecs = OldEltCount / NewEltCount; + +// Create registers to represent each element of ValReg. Load into these, +// then combine them at the end. +SmallVector ComponentRegs; +for (unsigned i = 0, e = ValTy.getNumElements(); i != e; i++) + ComponentRegs.push_back( + MRI.createGenericVirtualRegister(ValTy.getElementType())); + +MachineMemOperand &MMO = **MI.memoperands_begin(); +auto GetMMO = [&MMO, &MI](int64_t Offset, LLT Ty) { + return MI.getMF()->getMachineMemOperand(&MMO, Offset, Ty); +}; + +for (unsigned i = 0, e = NumVecs; i != e; i++) { + auto LoadChunk = MIRBuilder.buildLoad( + NewTy, LoadRegWithOffset, *GetMMO(i * NewTy.getSizeInBytes(), NewTy)); + + auto LoadOffset = MIRBuilder.buildConstant( + LLT::scalar(64), (i + 1) * NewTy.getSizeInBytes()); + + LoadRegWithOffset = + MIRBuilder.buildPtrAdd(MRI.getType(LoadReg), LoadReg, LoadOffset) + .getReg(0); + + Register *ChunkFirstReg = ComponentRegs.begin() + (i * NewEltCount); + MIRBuilder.buildUnmerge({ChunkFirstReg, ChunkFirstReg + NewEltCount}, + LoadChunk.getReg(0)); +} + +unsigned ExtraElems = OldEltCount % NewEltCount; +if (ExtraElems != 0) { + LLT ExtraTy = LLT::fixed_vector(ExtraElems, ValTy.getElementType()); + + auto ExtraLoadChunk = MIRBuilder.buildLoad( + ExtraTy, LoadRegWithOffset, + *GetMMO(NumVecs * NewTy.getSizeInBytes(), ExtraTy)); + + MIRBuilder.buildUnmerge({ComponentRegs.begin() + (NumVecs * NewEltCount), + ComponentRegs.end()}, + ExtraLoadChunk.getReg(0)); +} + +MIRBuilder.buildBuildVector(ValReg, ComponentRegs); +MI.eraseFromParent(); +return true; + } + if (ValTy == LLT::scalar(128)) { AtomicOrdering Ordering = (*MI.memoperands_begin())->getSuccessOrdering(); diff --git a/llvm/test/CodeGen/AArch64/GlobalISe
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85039)
https://github.com/dc03-work closed https://github.com/llvm/llvm-project/pull/85039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85039)
dc03-work wrote: Okay... I give up on trying to fix this through spr... I'll create my stacks manually next time 😞 https://github.com/llvm/llvm-project/pull/85039 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
dc03-work wrote: This PR is actually stacked on https://github.com/llvm/llvm-project/pull/85038. Sorry for the noise earlier. https://github.com/llvm/llvm-project/pull/85042 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
https://github.com/dc03-work updated https://github.com/llvm/llvm-project/pull/85042 >From ec953a06c9a3c9a29155bc07dfc3a1bdb033ee23 Mon Sep 17 00:00:00 2001 From: Dhruv Chawla Date: Wed, 13 Mar 2024 10:36:35 +0530 Subject: [PATCH] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads This patch adds custom legalization for G_LOAD where it splits loads of fixed-width vector types larger than 128 bits into loads of 128-bit vectors with the same element type. This is an improvement to what was being done before where loads would be split into individual loads for each element of the vector. --- .../AArch64/GISel/AArch64LegalizerInfo.cpp| 10 +- .../GlobalISel/legalize-load-store.mir| 41 +- llvm/test/CodeGen/AArch64/vecreduce-add.ll| 1476 +++-- 3 files changed, 225 insertions(+), 1302 deletions(-) diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp index fea9d4495f44c7..2ae2923dfb353e 100644 --- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp +++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp @@ -373,6 +373,11 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) .legalForTypesWithMemDesc( {{s32, p0, s8, 8}, {s32, p0, s16, 8}, {s64, p0, s32, 8}}) .widenScalarToNextPow2(0, /* MinSize = */ 8) + .clampMaxNumElements(0, s8, 16) + .clampMaxNumElements(0, s16, 8) + .clampMaxNumElements(0, s32, 4) + .clampMaxNumElements(0, s64, 2) + .clampMaxNumElements(0, p0, 2) .lowerIfMemSizeNotByteSizePow2() .clampScalar(0, s8, s64) .narrowScalarIf( @@ -383,11 +388,6 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST) Query.Types[0].getSizeInBits() > 32; }, changeTo(0, s32)) - .clampMaxNumElements(0, s8, 16) - .clampMaxNumElements(0, s16, 8) - .clampMaxNumElements(0, s32, 4) - .clampMaxNumElements(0, s64, 2) - .clampMaxNumElements(0, p0, 2) // TODO: Use BITCAST for v2i8, v2i16 after G_TRUNC gets sorted out .bitcastIf(typeInSet(0, {v4s8}), [=](const LegalityQuery &Query) { diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store.mir index 5cbb8649d158b0..aa152aea81ff9c 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store.mir +++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-load-store.mir @@ -711,33 +711,24 @@ body: | ; CHECK: liveins: $x0 ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: %ptr:_(p0) = COPY $x0 -; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(p0) = G_LOAD %ptr(p0) :: (load (p0), align 64) -; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8 +; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(<2 x s64>) = G_LOAD %ptr(p0) :: (load (<2 x s64>), align 64) +; CHECK-NEXT: [[BITCAST:%[0-9]+]]:_(<2 x p0>) = G_BITCAST [[LOAD]](<2 x s64>) +; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16 ; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD %ptr, [[C]](s64) -; CHECK-NEXT: [[LOAD1:%[0-9]+]]:_(p0) = G_LOAD [[PTR_ADD]](p0) :: (load (p0) from unknown-address + 8) -; CHECK-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 16 +; CHECK-NEXT: [[LOAD1:%[0-9]+]]:_(<2 x s64>) = G_LOAD [[PTR_ADD]](p0) :: (load (<2 x s64>) from unknown-address + 16) +; CHECK-NEXT: [[BITCAST1:%[0-9]+]]:_(<2 x p0>) = G_BITCAST [[LOAD1]](<2 x s64>) +; CHECK-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 32 ; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD %ptr, [[C1]](s64) -; CHECK-NEXT: [[LOAD2:%[0-9]+]]:_(p0) = G_LOAD [[PTR_ADD1]](p0) :: (load (p0) from unknown-address + 16, align 16) -; CHECK-NEXT: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 24 -; CHECK-NEXT: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD %ptr, [[C2]](s64) -; CHECK-NEXT: [[LOAD3:%[0-9]+]]:_(p0) = G_LOAD [[PTR_ADD2]](p0) :: (load (p0) from unknown-address + 24) -; CHECK-NEXT: [[C3:%[0-9]+]]:_(s64) = G_CONSTANT i64 32 -; CHECK-NEXT: [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD %ptr, [[C3]](s64) -; CHECK-NEXT: [[LOAD4:%[0-9]+]]:_(p0) = G_LOAD [[PTR_ADD3]](p0) :: (load (p0) from unknown-address + 32, align 32) -; CHECK-NEXT: [[C4:%[0-9]+]]:_(s64) = G_CONSTANT i64 40 -; CHECK-NEXT: [[PTR_ADD4:%[0-9]+]]:_(p0) = G_PTR_ADD %ptr, [[C4]](s64) -; CHECK-NEXT: [[LOAD5:%[0-9]+]]:_(p0) = G_LOAD [[PTR_ADD4]](p0) :: (load (p0) from unknown-address + 40) -; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p0>) = G_BUILD_VECTOR [[LOAD]](p0), [[LOAD1]](p0) -; CHECK-NEXT: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x p0>) = G_BUILD_VECTOR [[LOAD2]](p0), [[LOAD3]](p0) -; CHECK-NEXT: [[BUILD_VECTOR2:%[0-9]+]]:_(<2 x p0>) = G_BUILD_VECTOR [[LOAD4]](p0), [[LOAD5]](p0) -; CHECK-NEXT: [[BITCAST:%[0-9]+]]:_(<2 x s64>) = G_BITCAST [[BUILD_VECTOR]](<2 x p0>) -; CHECK-NEX
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
dc03-work wrote: > It looks like this comes from the lowerIfMemSizeNotByteSizePow2. Custom is > often best avoided unless there is not anther way, or the change is quite > target-dependant. > > Can we try something like this instead? > > ``` > .clampMaxNumElements(0, s8, 16) > .clampMaxNumElements(0, s16, 8) > .clampMaxNumElements(0, s32, 4) > .clampMaxNumElements(0, s64, 2) > .clampMaxNumElements(0, p0, 2) > .lowerIfMemSizeNotByteSizePow2() > ... > ``` Yup, that works :) https://github.com/llvm/llvm-project/pull/85042 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
https://github.com/dc03-work closed https://github.com/llvm/llvm-project/pull/85042 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
https://github.com/dc03-work edited https://github.com/llvm/llvm-project/pull/85042 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][GlobalISel] Avoid splitting loads of large vector types into individual element loads (PR #85042)
dc03-work wrote: Manually landed this PR in https://github.com/llvm/llvm-project/commit/208a9850e6a4b64ad6311361735d27a9c6cbd8ec, because I still don't understand how stacking on GitHub works... https://github.com/llvm/llvm-project/pull/85042 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64] Remove usage of PostRAScheduler (PR #92871)
dc03-work wrote: > > Submit your PRs to `main` branch > > I used [spr](https://getcord.github.io/spr/) to create this PR, so I think > it's OK. No, your target branch is wrong. Either you should want to merge into `main` or into a branch for another PR created by `spr`. However, in this case it appears you are targeting the same branch as your source. https://github.com/llvm/llvm-project/pull/92871 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [AArch64] Remove usage of PostRAScheduler (PR #92871)
dc03-work wrote: > > > > Submit your PRs to `main` branch > > > > > > > > > I used [spr](https://getcord.github.io/spr/) to create this PR, so I > > > think it's OK. > > > > > > No, your target branch is wrong. Either you should want to merge into > > `main` or into a branch for another PR created by `spr`. However, in this > > case it appears you are targeting the same branch as your source. > > These are two different branches: wangpc-pp wants to merge 1 commit into > users/wangpc-pp/spr/main.aarch64-remove-usage-of-postrascheduler from > users/wangpc-pp/spr/aarch64-remove-usage-of-postrascheduler Okay, but looking at the commits of the target: https://github.com/llvm/llvm-project/commits/users/wangpc-pp/spr/main.aarch64-remove-usage-of-postrascheduler/, it appears that spr has squashed all main commits into one commit. Anyways, this isn't the main branch. Change your PR to target the `main` branch. https://github.com/llvm/llvm-project/pull/92871 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits