pengfei added inline comments.
================ Comment at: llvm/lib/IR/ConstantFold.cpp:539 + if (V->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() + && opc != Instruction::AddrSpaceCast) return Constant::getNullValue(DestTy); ---------------- Operation should at the end of the line. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:55 + switch (II->getIntrinsicID()) { + default: { + Row = II->getArgOperand(0); ---------------- I think we'd better to check exceptions. E.g. ``` default: llvm_unreachable(""); case Intrinsic::x86_tileloadd64_internal: case Intrinsic::x86_tdpbssd_internal: case Intrinsic::x86_tilestored64_internal: Row = II->getArgOperand(0); Col = II->getArgOperand(1); break; ``` ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:114 + Value *Row = nullptr, *Col = nullptr; + Use &U = *(Bitcast->use_begin()); + unsigned OpNo = U.getOperandNo(); ---------------- Why don't check empty like line 157? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:193 + // %2 = load <256 x i32>, <256 x i32>* %addr, align 1024 + auto *II = cast<IntrinsicInst>(Src); + Value *Row = II->getOperand(0); ---------------- Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. ``` %src = bitcast <256 x i32> %xxx to x86_amx %2 = bitcast x86_amx %src to <256 x i32> ``` ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:233 bool X86LowerAMXType::visit() { bool C; + SmallVector<Instruction *, 8> DeadInsts; ---------------- Better move it to line 310. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:238 for (Instruction &Inst : BB) { - LoadInst *LD = dyn_cast<LoadInst>(&Inst); - // Check load instruction. - // %3 = load <256 x i32>, <256 x i32>* %1, align 64 - if (LD) { - FixedVectorType *VTy = dyn_cast<FixedVectorType>(Inst.getType()); - if (!IsAMXType(VTy)) - continue; - LDSet.insert(&Inst); + if (!dyn_cast<BitCastInst>(&Inst)) continue; ---------------- Better to reuse the cast result, e.g. ``` BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst); if (!BInst ) ``` You can save several `cast<BitCastInst>(&Inst)` below. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:265 + // If the dst type is <256 x i32>*, it is valid intruction. + // %0 = bitcast x86_amx* %tile to <256 x i32>* + // %1 = load <256 x i32>, <256 x i32>* %0, align 64 ---------------- Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it exists? ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:271 + LoadInst *LD = dyn_cast<LoadInst>(Src); + if (!LD || !LD->hasOneUser()) { + transformBitcast(cast<BitCastInst>(&Inst)); ---------------- Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285. ================ Comment at: llvm/lib/Target/X86/X86LowerAMXType.cpp:286 + if (!Inst.hasOneUser()) { + transformBitcast(cast<BitCastInst>(&Inst)); + DeadInsts.push_back(&Inst); ---------------- Why we need to consider <256 x i32> has more than one use? ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-across-func.ll:89-91 attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #4 = { nounwind } ---------------- Better to remove these unused attributes. The same to other tests. ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:67 + +define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 { +; CHECK-LABEL: @test_src_add( ---------------- For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64. ================ Comment at: llvm/test/CodeGen/X86/AMX/amx-type.ll:103 + define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 { ; CHECK-LABEL: @test_load( ---------------- We don't need to check this case now, right? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91927/new/ https://reviews.llvm.org/D91927 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits