yubing added inline comments.
================ Comment at: llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp:99 + Loop *RowLoop = LI.AllocateLoop(); + Loop *ColLoop = LI.AllocateLoop(); + RowLoop->addChildLoop(ColLoop); ---------------- pengfei wrote: > Not sure how about the arithmetic intrinsics. But at least for load and store > intrinsics we can use LLVM intrinsic `llvm.masked.load/store` to reduce the > inner loop. I think We can compose a follow-up patch for this optimization Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93594/new/ https://reviews.llvm.org/D93594 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits