Currently, we vectorize CTZ for SVE by using the following operation:
.CTZ (X) = (PREC - 1) - .CLZ (X & -X)

Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in PR109498.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR <soum...@nvidia.com>

gcc/ChangeLog:
        PR target/109498
        * config/aarch64/aarch64-sve.md (ctz<mode>2): Added pattern to expand
        CTZ to RBIT + CLZ for SVE.

gcc/testsuite/ChangeLog:
        PR target/109498
        * gcc.target/aarch64/sve/ctz.c: New test.

Attachment: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch

Reply via email to