Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

Pengfei Li Fri, 02 May 2025 05:55:16 -0700

> Heh.  This is a bit of a hobby-horse of mine.  IMO we should be trying
> to make the generic, target-independent vector operations as useful
> as possible, so that people only need to resort to target-specific
> intrinsics if they're doing something genuinely target-specific.
> At the moment, we have the problem that the intrinsics are being
> used for two very different purposes:
> 
> (1) Let people who know the architecture well write high-level assembly.
>     For this use case, the compiler should only interfere with the
>    user's instruction selection if the compiler can be sure that
>    it's improving things.
>
> (2) Vector intrinsics just express dataflow, with no expectation from the
>     user about how the intrinsics will be implemented.  In this use case,
>     svand is "&, but for SVE vectors".  The user wants to do an "&",
>     looks up the SVE intrinsic for AND, and writes "svand" (or more
>     likely, uses a retargetable SIMD framework that does this for them).
>     Then the compiler is expected to map svand back to "&" internally.
>
> So yeah, IMO we should encourage users in group (2) to use C/C++
> operators or generic builtins where possible, since it expresses the
> intent better and is far less cumbersome.  And I agree that that's the
> more important case as far as this fold goes.  So personally I'd be
> happy with just that.
> 
> But getting nonzero_bits information out of intrinsics is a legitimate
> use case too.  It's up to you whether you want to go that far.


Thank you for sharing your thought. AFAIK, the initial motivation of this fold 
is to optimize some code pattern in SLEEF and it can be done without the SVE 
built-in part. The SVE built-in part is what I thought could be extended to 
handle more cases in the future. But I'm not sure if there's a real need for it 
now.

So I'm going to split my patch and drop the SVE built-in part at the moment. I 
can re-do it later when either there's a clear need or I've figured out a 
better way to implement it.

--
Thanks,
Pengfei

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

Reply via email to