Currently, we vectorize CTZ for SVE by using the following operation: .CTZ (X) = (PREC - 1) - .CLZ (X & -X)
Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soum...@nvidia.com> gcc/ChangeLog: PR target/109498 * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand CTZ to RBIT + CLZ for SVE. gcc/testsuite/ChangeLog: PR target/109498 * gcc.target/aarch64/sve/ctz.c: New test.
0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch