from:"Nicholas Guy via llvm\-branch\-commits"

[llvm-branch-commits] [llvm] dda6003 - [AArch64] Attempt to sink mul operands

2021-01-13 Thread Nicholas Guy via llvm-branch-commits


Author: Nicholas Guy
Date: 2021-01-13T15:23:36Z
New Revision: dda60035e9f0769c8907cdf6561489e0435c2275

URL: 
https://github.com/llvm/llvm-project/commit/dda60035e9f0769c8907cdf6561489e0435c2275
DIFF: 
https://github.com/llvm/llvm-project/commit/dda60035e9f0769c8907cdf6561489e0435c2275.diff

LOG: [AArch64] Attempt to sink mul operands

Following on from D91255, this patch is responsible for sinking relevant mul
operands to the same block so that umull/smull instructions can be correctly
generated by the mul combine implemented in the aforementioned patch.

Differential revision: https://reviews.llvm.org/D91271

Added: 
llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index b500cd534a1f..082fdf390786 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -10956,6 +10956,43 @@ bool AArch64TargetLowering::shouldSinkOperands(
 
 return true;
   }
+  case Instruction::Mul: {
+bool IsProfitable = false;
+for (auto &Op : I->operands()) {
+  // Make sure we are not already sinking this operand
+  if (any_of(Ops, [&](Use *U) { return U->get() == Op; }))
+continue;
+
+  ShuffleVectorInst *Shuffle = dyn_cast(Op);
+  if (!Shuffle || !Shuffle->isZeroEltSplat())
+continue;
+
+  Value *ShuffleOperand = Shuffle->getOperand(0);
+  InsertElementInst *Insert = dyn_cast(ShuffleOperand);
+  if (!Insert)
+continue;
+
+  Instruction *OperandInstr = dyn_cast(Insert->getOperand(1));
+  if (!OperandInstr)
+continue;
+
+  ConstantInt *ElementConstant =
+  dyn_cast(Insert->getOperand(2));
+  // Check that the insertelement is inserting into element 0
+  if (!ElementConstant || ElementConstant->getZExtValue() != 0)
+continue;
+
+  unsigned Opcode = OperandInstr->getOpcode();
+  if (Opcode != Instruction::SExt && Opcode != Instruction::ZExt)
+continue;
+
+  Ops.push_back(&Shuffle->getOperandUse(0));
+  Ops.push_back(&Op);
+  IsProfitable = true;
+}
+
+return IsProfitable;
+  }
   default:
 return false;
   }

diff  --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
new file mode 100644
index ..966cf7b46daa
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -0,0 +1,186 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-none-linux-gnu < %s -o -| FileCheck %s
+
+define void @matrix_mul_unsigned(i32 %N, i32* nocapture %C, i16* nocapture 
readonly %A, i16 %val) {
+; CHECK-LABEL: matrix_mul_unsigned:
+; CHECK:   // %bb.0: // %vector.header
+; CHECK-NEXT:and w9, w3, #0x
+; CHECK-NEXT:// kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:and x8, x0, #0xfff8
+; CHECK-NEXT:dup v0.4h, w9
+; CHECK-NEXT:  .LBB0_1: // %vector.body
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:add x9, x2, w0, uxtw #1
+; CHECK-NEXT:ldp d1, d2, [x9]
+; CHECK-NEXT:add x9, x1, w0, uxtw #2
+; CHECK-NEXT:subs x8, x8, #8 // =8
+; CHECK-NEXT:add w0, w0, #8 // =8
+; CHECK-NEXT:umull v1.4s, v0.4h, v1.4h
+; CHECK-NEXT:umull v2.4s, v0.4h, v2.4h
+; CHECK-NEXT:stp q1, q2, [x9]
+; CHECK-NEXT:b.ne .LBB0_1
+; CHECK-NEXT:  // %bb.2: // %for.end12
+; CHECK-NEXT:ret
+vector.header:
+  %conv4 = zext i16 %val to i32
+  %wide.trip.count = zext i32 %N to i64
+  %0 = add nsw i64 %wide.trip.count, -1
+  %min.iters.check = icmp ult i32 %N, 8
+  %1 = trunc i64 %0 to i32
+  %2 = icmp ugt i64 %0, 4294967295
+  %n.vec = and i64 %wide.trip.count, 4294967288
+  %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %conv4, i32 0
+  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> 
undef, <4 x i32> zeroinitializer
+  %broadcast.splatinsert31 = insertelement <4 x i32> undef, i32 %conv4, i32 0
+  %broadcast.splat32 = shufflevector <4 x i32> %broadcast.splatinsert31, <4 x 
i32> undef, <4 x i32> zeroinitializer
+  %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+  br label %vector.body
+
+vector.body:  ; preds = %vector.header, 
%vector.body
+  %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.header ]
+  %3 = trunc i64 %index to i32
+  %4 = add i32 %N, %3
+  %5 = zext i32 %4 to i64
+  %6 = getelementptr inbounds i16, i16* %A, i64 %5
+  %7 = bitcast i16* %6 to <4 x i16>*
+  %wide.load = load <4 x i16>, <4 x i16>* %7, align 2
+  %8 = getelementptr inbounds i16, i16* %6, i64 4
+  %9 = bitcast i16* %8 to <4 x i16>*
+  %wide.load30 = load <4 x i16>, <4 x i16>* %9, align 2
+  %10 = zext <4

[llvm-branch-commits] [llvm] f5fcbe4 - [AArch64] Further restricts when a dup(*ext) can be rearranged

2021-01-18 Thread Nicholas Guy via llvm-branch-commits


Author: Nicholas Guy
Date: 2021-01-18T16:00:21Z
New Revision: f5fcbe4e3c68584ef4858590a079f17593feabbd

URL: 
https://github.com/llvm/llvm-project/commit/f5fcbe4e3c68584ef4858590a079f17593feabbd
DIFF: 
https://github.com/llvm/llvm-project/commit/f5fcbe4e3c68584ef4858590a079f17593feabbd.diff

LOG: [AArch64] Further restricts when a dup(*ext) can be rearranged

In most cases, the dup(*ext) pattern can be rearranged to perform
the extension on the vector side, allowing for further vector-specific
optimisations to be made. However the initial checks for this conversion
were insufficient, allowing invalid encodings to be attempted (causing
compilation to fail).

Differential Revision: https://reviews.llvm.org/D94778

Added: 
llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 6e4ac0f711dd..39c40ef0b36d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -11843,7 +11843,8 @@ static SDValue performCommonVectorExtendCombine(SDValue 
VectorShuffle,
 
   SDValue InsertVectorNode = DAG.getNode(
   InsertVectorElt.getOpcode(), DL, PreExtendVT, DAG.getUNDEF(PreExtendVT),
-  Extend.getOperand(0), DAG.getConstant(0, DL, MVT::i64));
+  DAG.getAnyExtOrTrunc(Extend.getOperand(0), DL, PreExtendType),
+  DAG.getConstant(0, DL, MVT::i64));
 
   std::vector ShuffleMask(TargetType.getVectorElementCount().getValue());
 
@@ -11851,9 +11852,8 @@ static SDValue performCommonVectorExtendCombine(SDValue 
VectorShuffle,
   DAG.getVectorShuffle(PreExtendVT, DL, InsertVectorNode,
DAG.getUNDEF(PreExtendVT), ShuffleMask);
 
-  SDValue ExtendNode =
-  DAG.getNode(IsSExt ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND, DL, TargetType,
-  VectorShuffleNode, DAG.getValueType(TargetType));
+  SDValue ExtendNode = DAG.getNode(IsSExt ? ISD::SIGN_EXTEND : 
ISD::ZERO_EXTEND,
+   DL, TargetType, VectorShuffleNode);
 
   return ExtendNode;
 }

diff  --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll
new file mode 100644
index ..51f91aa1b940
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-crash.ll
@@ -0,0 +1,33 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -o -| FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+; This test covers a case where an AArch64 DUP instruction is generated with an
+; invalid encoding, resulting in a crash. We don't care about the specific 
output
+; here, only that this case no longer causes said crash.
+define dso_local i32 @dupext_crashtest(i32 %e) local_unnamed_addr {
+; CHECK-LABEL: dupext_crashtest:
+for.body.lr.ph:
+  %conv314 = zext i32 %e to i64
+  br label %vector.memcheck
+
+vector.memcheck:  ; preds = %for.body.lr.ph
+  br label %vector.ph
+
+vector.ph:; preds = %vector.memcheck
+  %broadcast.splatinsert = insertelement <2 x i64> poison, i64 %conv314, i32 0
+  %broadcast.splat = shufflevector <2 x i64> %broadcast.splatinsert, <2 x i64> 
poison, <2 x i32> zeroinitializer
+  br label %vector.body
+
+vector.body:  ; preds = %vector.body, 
%vector.ph
+  %wide.load = load <2 x i32>, <2 x i32>* undef, align 4
+  %0 = zext <2 x i32> %wide.load to <2 x i64>
+  %1 = mul nuw <2 x i64> %broadcast.splat, %0
+  %2 = trunc <2 x i64> %1 to <2 x i32>
+  %3 = select <2 x i1> undef, <2 x i32> undef, <2 x i32> %2
+  %4 = bitcast i32* undef to <2 x i32>*
+  store <2 x i32> %3, <2 x i32>* %4, align 4
+  br label %vector.body
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 16bf02c - Reland "[AArch64] Attempt to sink mul operands""

2021-01-18 Thread Nicholas Guy via llvm-branch-commits


Author: Nicholas Guy
Date: 2021-01-18T16:00:22Z
New Revision: 16bf02c3a19d4e1f4a19cb243de612e17f54f5a9

URL: 
https://github.com/llvm/llvm-project/commit/16bf02c3a19d4e1f4a19cb243de612e17f54f5a9
DIFF: 
https://github.com/llvm/llvm-project/commit/16bf02c3a19d4e1f4a19cb243de612e17f54f5a9.diff

LOG: Reland "[AArch64] Attempt to sink mul operands""

This relands dda60035e9f0769c8907cdf6561489e0435c2275,
which was reverted by dbaa6a1858a42f72b683f700d3bd7a9632f7a518

Added: 
llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 39c40ef0b36d6..cc64e0e03ad88 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -10965,6 +10965,43 @@ bool AArch64TargetLowering::shouldSinkOperands(
 
 return true;
   }
+  case Instruction::Mul: {
+bool IsProfitable = false;
+for (auto &Op : I->operands()) {
+  // Make sure we are not already sinking this operand
+  if (any_of(Ops, [&](Use *U) { return U->get() == Op; }))
+continue;
+
+  ShuffleVectorInst *Shuffle = dyn_cast(Op);
+  if (!Shuffle || !Shuffle->isZeroEltSplat())
+continue;
+
+  Value *ShuffleOperand = Shuffle->getOperand(0);
+  InsertElementInst *Insert = dyn_cast(ShuffleOperand);
+  if (!Insert)
+continue;
+
+  Instruction *OperandInstr = dyn_cast(Insert->getOperand(1));
+  if (!OperandInstr)
+continue;
+
+  ConstantInt *ElementConstant =
+  dyn_cast(Insert->getOperand(2));
+  // Check that the insertelement is inserting into element 0
+  if (!ElementConstant || ElementConstant->getZExtValue() != 0)
+continue;
+
+  unsigned Opcode = OperandInstr->getOpcode();
+  if (Opcode != Instruction::SExt && Opcode != Instruction::ZExt)
+continue;
+
+  Ops.push_back(&Shuffle->getOperandUse(0));
+  Ops.push_back(&Op);
+  IsProfitable = true;
+}
+
+return IsProfitable;
+  }
   default:
 return false;
   }

diff  --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll 
b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
new file mode 100644
index 0..966cf7b46daa5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll
@@ -0,0 +1,186 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-none-linux-gnu < %s -o -| FileCheck %s
+
+define void @matrix_mul_unsigned(i32 %N, i32* nocapture %C, i16* nocapture 
readonly %A, i16 %val) {
+; CHECK-LABEL: matrix_mul_unsigned:
+; CHECK:   // %bb.0: // %vector.header
+; CHECK-NEXT:and w9, w3, #0x
+; CHECK-NEXT:// kill: def $w0 killed $w0 def $x0
+; CHECK-NEXT:and x8, x0, #0xfff8
+; CHECK-NEXT:dup v0.4h, w9
+; CHECK-NEXT:  .LBB0_1: // %vector.body
+; CHECK-NEXT:// =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:add x9, x2, w0, uxtw #1
+; CHECK-NEXT:ldp d1, d2, [x9]
+; CHECK-NEXT:add x9, x1, w0, uxtw #2
+; CHECK-NEXT:subs x8, x8, #8 // =8
+; CHECK-NEXT:add w0, w0, #8 // =8
+; CHECK-NEXT:umull v1.4s, v0.4h, v1.4h
+; CHECK-NEXT:umull v2.4s, v0.4h, v2.4h
+; CHECK-NEXT:stp q1, q2, [x9]
+; CHECK-NEXT:b.ne .LBB0_1
+; CHECK-NEXT:  // %bb.2: // %for.end12
+; CHECK-NEXT:ret
+vector.header:
+  %conv4 = zext i16 %val to i32
+  %wide.trip.count = zext i32 %N to i64
+  %0 = add nsw i64 %wide.trip.count, -1
+  %min.iters.check = icmp ult i32 %N, 8
+  %1 = trunc i64 %0 to i32
+  %2 = icmp ugt i64 %0, 4294967295
+  %n.vec = and i64 %wide.trip.count, 4294967288
+  %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %conv4, i32 0
+  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> 
undef, <4 x i32> zeroinitializer
+  %broadcast.splatinsert31 = insertelement <4 x i32> undef, i32 %conv4, i32 0
+  %broadcast.splat32 = shufflevector <4 x i32> %broadcast.splatinsert31, <4 x 
i32> undef, <4 x i32> zeroinitializer
+  %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+  br label %vector.body
+
+vector.body:  ; preds = %vector.header, 
%vector.body
+  %index = phi i64 [ %index.next, %vector.body ], [ 0, %vector.header ]
+  %3 = trunc i64 %index to i32
+  %4 = add i32 %N, %3
+  %5 = zext i32 %4 to i64
+  %6 = getelementptr inbounds i16, i16* %A, i64 %5
+  %7 = bitcast i16* %6 to <4 x i16>*
+  %wide.load = load <4 x i16>, <4 x i16>* %7, align 2
+  %8 = getelementptr inbounds i16, i16* %6, i64 4
+  %9 = bitcast i16* %8 to <4 x i16>*
+  %wide.load30 = load <4 x i16>, <4 x i16>* %9, align 2
+  %10 = zext <4 x i16> %wide.load to <4 x i32>
+  %11 = zext <4 x i16> %wide.load30 to <4 x i32>
+  %12 = mul nuw nsw <4 x i32> %broadcast.splat, %10
+  %13 = mul

[llvm-branch-commits] [llvm] 350247a - [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))

2021-01-06 Thread Nicholas Guy via llvm-branch-commits


Author: Nicholas Guy
Date: 2021-01-06T16:02:16Z
New Revision: 350247a93c07906300b79955ff882004a92ae368

URL: 
https://github.com/llvm/llvm-project/commit/350247a93c07906300b79955ff882004a92ae368
DIFF: 
https://github.com/llvm/llvm-project/commit/350247a93c07906300b79955ff882004a92ae368.diff

LOG: [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))

Performing this rearrangement allows for existing patterns
to match cases where the vector may be built after an extend,
instead of before.

Differential Revision: https://reviews.llvm.org/D91255

Added: 
llvm/test/CodeGen/AArch64/aarch64-dup-ext-scalable.ll
llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 41dc285a368d..40435c12ca3b 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -11705,9 +11705,152 @@ static bool IsSVECntIntrinsic(SDValue S) {
   return false;
 }
 
+/// Calculates what the pre-extend type is, based on the extension
+/// operation node provided by \p Extend.
+///
+/// In the case that \p Extend is a SIGN_EXTEND or a ZERO_EXTEND, the
+/// pre-extend type is pulled directly from the operand, while other extend
+/// operations need a bit more inspection to get this information.
+///
+/// \param Extend The SDNode from the DAG that represents the extend operation
+/// \param DAG The SelectionDAG hosting the \p Extend node
+///
+/// \returns The type representing the \p Extend source type, or \p MVT::Other
+/// if no valid type can be determined
+static EVT calculatePreExtendType(SDValue Extend, SelectionDAG &DAG) {
+  switch (Extend.getOpcode()) {
+  case ISD::SIGN_EXTEND:
+  case ISD::ZERO_EXTEND:
+return Extend.getOperand(0).getValueType();
+  case ISD::AssertSext:
+  case ISD::AssertZext:
+  case ISD::SIGN_EXTEND_INREG: {
+VTSDNode *TypeNode = dyn_cast(Extend.getOperand(1));
+if (!TypeNode)
+  return MVT::Other;
+return TypeNode->getVT();
+  }
+  case ISD::AND: {
+ConstantSDNode *Constant =
+dyn_cast(Extend.getOperand(1).getNode());
+if (!Constant)
+  return MVT::Other;
+
+uint32_t Mask = Constant->getZExtValue();
+
+if (Mask == UCHAR_MAX)
+  return MVT::i8;
+else if (Mask == USHRT_MAX)
+  return MVT::i16;
+else if (Mask == UINT_MAX)
+  return MVT::i32;
+
+return MVT::Other;
+  }
+  default:
+return MVT::Other;
+  }
+
+  llvm_unreachable("Code path unhandled in calculatePreExtendType!");
+}
+
+/// Combines a dup(sext/zext) node pattern into sext/zext(dup)
+/// making use of the vector SExt/ZExt rather than the scalar SExt/ZExt
+static SDValue performCommonVectorExtendCombine(SDValue VectorShuffle,
+SelectionDAG &DAG) {
+
+  ShuffleVectorSDNode *ShuffleNode =
+  dyn_cast(VectorShuffle.getNode());
+  if (!ShuffleNode)
+return SDValue();
+
+  // Ensuring the mask is zero before continuing
+  if (!ShuffleNode->isSplat() || ShuffleNode->getSplatIndex() != 0)
+return SDValue();
+
+  SDValue InsertVectorElt = VectorShuffle.getOperand(0);
+
+  if (InsertVectorElt.getOpcode() != ISD::INSERT_VECTOR_ELT)
+return SDValue();
+
+  SDValue InsertLane = InsertVectorElt.getOperand(2);
+  ConstantSDNode *Constant = dyn_cast(InsertLane.getNode());
+  // Ensures the insert is inserting into lane 0
+  if (!Constant || Constant->getZExtValue() != 0)
+return SDValue();
+
+  SDValue Extend = InsertVectorElt.getOperand(1);
+  unsigned ExtendOpcode = Extend.getOpcode();
+
+  bool IsSExt = ExtendOpcode == ISD::SIGN_EXTEND ||
+ExtendOpcode == ISD::SIGN_EXTEND_INREG ||
+ExtendOpcode == ISD::AssertSext;
+  if (!IsSExt && ExtendOpcode != ISD::ZERO_EXTEND &&
+  ExtendOpcode != ISD::AssertZext && ExtendOpcode != ISD::AND)
+return SDValue();
+
+  EVT TargetType = VectorShuffle.getValueType();
+  EVT PreExtendType = calculatePreExtendType(Extend, DAG);
+
+  if ((TargetType != MVT::v8i16 && TargetType != MVT::v4i32 &&
+   TargetType != MVT::v2i64) ||
+  (PreExtendType == MVT::Other))
+return SDValue();
+
+  EVT PreExtendVT = TargetType.changeVectorElementType(PreExtendType);
+
+  if (PreExtendVT.getVectorElementCount() != 
TargetType.getVectorElementCount())
+return SDValue();
+
+  if (TargetType.getScalarSizeInBits() != PreExtendVT.getScalarSizeInBits() * 
2)
+return SDValue();
+
+  SDLoc DL(VectorShuffle);
+
+  SDValue InsertVectorNode = DAG.getNode(
+  InsertVectorElt.getOpcode(), DL, PreExtendVT, DAG.getUNDEF(PreExtendVT),
+  Extend.getOperand(0), DAG.getConstant(0, DL, MVT::i64));
+
+  std::vector ShuffleMask(TargetType.getVectorElementCount().getValue());
+
+  SDValue VectorShuffleNode =
+  DAG.getVector

[llvm-branch-commits] [llvm] ed23229 - [AArch64] Fix crash caused by invalid vector element type

2021-01-08 Thread Nicholas Guy via llvm-branch-commits


Author: Nicholas Guy
Date: 2021-01-08T12:02:54Z
New Revision: ed23229a64aed5b9d6120d57138d475291ca3667

URL: 
https://github.com/llvm/llvm-project/commit/ed23229a64aed5b9d6120d57138d475291ca3667
DIFF: 
https://github.com/llvm/llvm-project/commit/ed23229a64aed5b9d6120d57138d475291ca3667.diff

LOG: [AArch64] Fix crash caused by invalid vector element type

Fixes a crash caused by D91255, when LLVMTy is null when
calling changeExtendedVectorElementType.

Differential Revision: https://reviews.llvm.org/D94234

Added: 
llvm/test/CodeGen/AArch64/aarch64-dup-ext-vectortype-crash.ll

Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 926d952425d0..80a203b9e7ef 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -11810,6 +11810,11 @@ static SDValue 
performCommonVectorExtendCombine(SDValue VectorShuffle,
   (PreExtendType == MVT::Other))
 return SDValue();
 
+  // Restrict valid pre-extend data type
+  if (PreExtendType != MVT::i8 && PreExtendType != MVT::i16 &&
+  PreExtendType != MVT::i32)
+return SDValue();
+
   EVT PreExtendVT = TargetType.changeVectorElementType(PreExtendType);
 
   if (PreExtendVT.getVectorElementCount() != 
TargetType.getVectorElementCount())

diff  --git a/llvm/test/CodeGen/AArch64/aarch64-dup-ext-vectortype-crash.ll 
b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-vectortype-crash.ll
new file mode 100644
index ..995d9a19e543
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-dup-ext-vectortype-crash.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -mtriple aarch64-none-linux-gnu | FileCheck %s
+
+; This test covers a case where extended value types can't be converted to
+; vector types, resulting in a crash. We don't care about the specific output
+; here, only that this case no longer causes said crash.
+; See https://reviews.llvm.org/D91255#2484399 for context
+define <8 x i16> @extend_i7_v8i16(i7 %src, <8 x i8> %b) {
+; CHECK-LABEL: extend_i7_v8i16:
+entry:
+%in = sext i7 %src to i16
+%ext.b = sext <8 x i8> %b to <8 x i16>
+%broadcast.splatinsert = insertelement <8 x i16> undef, i16 %in, i16 0
+%broadcast.splat = shufflevector <8 x i16> %broadcast.splatinsert, <8 x 
i16> undef, <8 x i32> zeroinitializer
+%out = mul nsw <8 x i16> %broadcast.splat, %ext.b
+ret <8 x i16> %out
+}



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits



@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))
+  VF = VF.divideCoefficientBy(ReductionR->getVFScaleFactor());
+else if (auto *PartialReductionR =
+ dyn_cast(R))
+  VF = VF.divideCoefficientBy(PartialReductionR->getScaleFactor());
+if (VF != VFs[J])

NickGuy-Arm wrote:

Nit: If the condition is only used for debug output then can it be moved to 
inside the LLVM_DEBUG

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits


https://github.com/NickGuy-Arm commented:

Looks generally good to me so far, with a few nitpicks.

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits





NickGuy-Arm wrote:

Could you pre-commit this test, so we can see how the output changes before and 
after the changes in LoopVectorize.cpp

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits


https://github.com/NickGuy-Arm edited 
https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits



@@ -2031,17 +2033,19 @@ class VPReductionPHIRecipe : public VPHeaderPHIRecipe,
 /// scalar value.
 class VPPartialReductionRecipe : public VPSingleDefRecipe {
   unsigned Opcode;
+  unsigned ScaleFactor;

NickGuy-Arm wrote:

Nit: Could this be `VFScaleFactor` to match the equivalent in 
`VPReductionPHIRecipe`?

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

2025-03-26 Thread Nicholas Guy via llvm-branch-commits



@@ -5026,10 +5026,23 @@ calculateRegisterUsage(VPlan &Plan, 
ArrayRef VFs,
 // even in the scalar case.
 RegUsage[ClassID] += 1;
   } else {
+// The output from scaled phis and scaled reductions actually have
+// fewer lanes than the VF.
+auto VF = VFs[J];
+if (auto *ReductionR = dyn_cast(R))

NickGuy-Arm wrote:

[Idle thought, feel free to ignore] 
I wonder if there's precedent to add a `getVFScaleFactor` or equivalent to the 
base recipe class (or one of the other subclasses), and allow any recipe to 
override it instead of explicitly checking for every type that could scale the 
VF. 
Likely not yet, and almost certainly not in this patch, but maybe something to 
consider in the future?

https://github.com/llvm/llvm-project/pull/133090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [LV] Fix crash when building partial reductions using types that aren't known scale factors (#136680) (PR #136863)

2025-04-28 Thread Nicholas Guy via llvm-branch-commits


NickGuy-Arm wrote:

I can verify that updating the test files doesn't impact the test itself. Looks 
to be some instruction reordering but no change to the functionality being 
tested, and this test passes on main without any further changes.

How do we go about updating the test on this branch, as I assume we don't have 
commit access to llvmbot's fork.

https://github.com/llvm/llvm-project/pull/136863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] dda6003 - [AArch64] Attempt to sink mul operands

[llvm-branch-commits] [llvm] f5fcbe4 - [AArch64] Further restricts when a dup(*ext) can be rearranged

[llvm-branch-commits] [llvm] 16bf02c - Reland "[AArch64] Attempt to sink mul operands""

[llvm-branch-commits] [llvm] 350247a - [AArch64] Rearrange mul(dup(sext/zext)) to mul(sext/zext(dup))

[llvm-branch-commits] [llvm] ed23229 - [AArch64] Fix crash caused by invalid vector element type

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] [LV] Reduce register usage for scaled reductions (PR #133090)

[llvm-branch-commits] [llvm] release/20.x: [LV] Fix crash when building partial reductions using types that aren't known scale factors (#136680) (PR #136863)

12 matches

Site Navigation

Mail list logo

Footer information