Pengfei Li <pengfei....@arm.com> writes: > Thank you for the comments. > >> I don't think we can use an unbounded recursive walk, since that >> would become quadratic if we ever used it when optimising one >> AND in a chain of ANDs. (And using this function for ANDs >> seems plausible.) Maybe we should be handling the information >> in a similar way to Ranger. > > I'm trying to get rid of the recursion by reusing the code in > get_nonzero_bits(). > >> Rather than handle the built-in case entirely in target code, how about >> having a target hook into nonzero_element_bits (or whatever replaces it) >> for machine-dependent builtins? > > From the perspective of necessity, do you think it's worth checking the > "svand" call inside, or worth handling the whole built-in case? Operations > with ACLE SVE types can also be folded as long as we use C/C++ general > operators which has been supported in GCC 15.
Heh. This is a bit of a hobby-horse of mine. IMO we should be trying to make the generic, target-independent vector operations as useful as possible, so that people only need to resort to target-specific intrinsics if they're doing something genuinely target-specific. At the moment, we have the problem that the intrinsics are being used for two very different purposes: (1) Let people who know the architecture well write high-level assembly. For this use case, the compiler should only interfere with the user's instruction selection if the compiler can be sure that it's improving things. (2) Vector intrinsics just express dataflow, with no expectation from the user about how the intrinsics will be implemented. In this use case, svand is "&, but for SVE vectors". The user wants to do an "&", looks up the SVE intrinsic for AND, and writes "svand" (or more likely, uses a retargetable SIMD framework that does this for them). Then the compiler is expected to map svand back to "&" internally. So yeah, IMO we should encourage users in group (2) to use C/C++ operators or generic builtins where possible, since it expresses the intent better and is far less cumbersome. And I agree that that's the more important case as far as this fold goes. So personally I'd be happy with just that. But getting nonzero_bits information out of intrinsics is a legitimate use case too. It's up to you whether you want to go that far. Thanks, Richard