Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

Richard Sandiford Fri, 02 May 2025 01:23:24 -0700

Pengfei Li <pengfei....@arm.com> writes:
> Thank you for the comments.
>
>> I don't think we can use an unbounded recursive walk, since that
>> would become quadratic if we ever used it when optimising one
>> AND in a chain of ANDs.  (And using this function for ANDs
>> seems plausible.)  Maybe we should be handling the information
>> in a similar way to Ranger.
>
> I'm trying to get rid of the recursion by reusing the code in 
> get_nonzero_bits().
>
>> Rather than handle the built-in case entirely in target code, how about
>> having a target hook into nonzero_element_bits (or whatever replaces it)
>> for machine-dependent builtins?
>
> From the perspective of necessity, do you think it's worth checking the 
> "svand" call inside, or worth handling the whole built-in case? Operations 
> with ACLE SVE types can also be folded as long as we use C/C++ general 
> operators which has been supported in GCC 15.


Heh.  This is a bit of a hobby-horse of mine.  IMO we should be trying
to make the generic, target-independent vector operations as useful
as possible, so that people only need to resort to target-specific
intrinsics if they're doing something genuinely target-specific.
At the moment, we have the problem that the intrinsics are being
used for two very different purposes:

(1) Let people who know the architecture well write high-level assembly.
    For this use case, the compiler should only interfere with the
    user's instruction selection if the compiler can be sure that
    it's improving things.

(2) Vector intrinsics just express dataflow, with no expectation from the
    user about how the intrinsics will be implemented.  In this use case,
    svand is "&, but for SVE vectors".  The user wants to do an "&",
    looks up the SVE intrinsic for AND, and writes "svand" (or more
    likely, uses a retargetable SIMD framework that does this for them).
    Then the compiler is expected to map svand back to "&" internally.

So yeah, IMO we should encourage users in group (2) to use C/C++
operators or generic builtins where possible, since it expresses the
intent better and is far less cumbersome.  And I agree that that's the
more important case as far as this fold goes.  So personally I'd be
happy with just that.

But getting nonzero_bits information out of intrinsics is a legitimate
use case too.  It's up to you whether you want to go that far.

Thanks,
Richard

Re: [PATCH] (not just) AArch64: Fold unsigned ADD + LSR by 1 to UHADD

Reply via email to