farzonl wrote: > AArch64 has a udot and sdot instruction (and a usdot instruction). They > perform a "partial" reduction though, producing a v4i32 from two v16i8 > inputs. We would like to use those from the vectorizer and have recently > added a partial-reduction intrinsic, but doing it with a higher level > intrinsic might be a little nicer.
We haven't done it yet, but our plan here is to create a default expansion in `TargetLoweringBase.cpp`. And then any backend thant has specalizations can add those specializations in your case to AArch64ISelLowering.cpp. > > It would seem like a "udot" can be represented already as > `vecreduce.add(mul(zext, zext))`, and fdot is simpler still. Is there any > particular reason to add a new intrinsic for it if it is already > representable as a vecreduce? And it would feel like a shame if it couldn't > be used with the actual AArch64 instructions. > There was a whole discussion on dot in https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294/13 check out `kparzysz` posts. Essentially Yes we could represent dot this way, but then we would not be able to benefit from the ubquity of the hardware specific dot lowerings that are showing up across gpu and convolution use cases. > @SamTebbs33 @NickGuy-Arm FYI. https://github.com/llvm/llvm-project/pull/102872 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits