efriedma added a comment. > This makes it impossible to do a neat trick when using NEON intrinsics: one > can load a number of constants using a single vector load, which are then > repeatedly used to multiply whole vectors by one of the constants. This trick > is used for a nice performance upside (2.5% to 4% on one microbenchmark) in > libjpeg-turbo.
I'm not completely sure I follow here. The "trick" is something like the following? int16x8_t v = {2,3,4,5,6,7,8,9}; a = vqdmulh_laneq_s16(a, v, 0); b = vqdmulh_laneq_s16(b, v, 1); c = vqdmulh_laneq_s16(c, v, 2); d = vqdmulh_laneq_s16(d, v, 3); [...] I can see how that could be helpful. The compiler could probably be taught to recover something like the original structure, but it would probably require a dedicated pass. Or I guess you could hack the source to use "volatile", but that would be ugly. I'm a little unhappy we're forced to introduce more intrinsics here, but it might be the best solution to avoid breaking carefully tuned code like this. ================ Comment at: llvm/lib/IR/Function.cpp:1374 + auto *RefVTy = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]); + if (!VTy || !RefVTy || VTy->getBitWidth() != 64) + return true; ---------------- Hardcoding "64" and "128" in target-independent code here seems like a bad idea. Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type? ================ Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:6054 + if (VT.getSizeInBits() == 64) + return std::make_pair(0U, &AArch64::FPR64_loRegClass); if (VT.getSizeInBits() == 128) ---------------- Is this related somehow? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71469/new/ https://reviews.llvm.org/D71469 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits