farzonl wrote:

> AArch64 has a udot and sdot instruction (and a usdot instruction). They 
> perform a "partial" reduction though, producing a v4i32 from two v16i8 
> inputs. We would like to use those from the vectorizer and have recently 
> added a partial-reduction intrinsic, but doing it with a higher level 
> intrinsic might be a little nicer.

We haven't done it yet, but our plan here is to create a default expansion in 
`TargetLoweringBase.cpp`. And then any backend thant has specalizations can  
add those specializations in your case to AArch64ISelLowering.cpp.

> 
> It would seem like a "udot" can be represented already as 
> `vecreduce.add(mul(zext, zext))`, and fdot is simpler still. Is there any 
> particular reason to add a new intrinsic for it if it is already 
> representable as a vecreduce? And it would feel like a shame if it couldn't 
> be used with the actual AArch64 instructions.
> 
There was a  whole discussion on dot in 
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294/13 check out 
`kparzysz` posts. Essentially Yes we could represent dot this way, but then we 
would not be able to benefit from the ubquity of the hardware specific dot 
lowerings that are showing up across gpu and convolution
 use cases.
> @SamTebbs33 @NickGuy-Arm FYI.



https://github.com/llvm/llvm-project/pull/102872
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to