> Hm. Any idea why that is? I wonder if the compiler isn't using as many > SVE registers as it could for this.
Not sure, we tried forcing loop unrolling using the below line in the MakeFile but the results are the same. pg_popcount_sve.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} -march=native > I've also noticed that the latest patch doesn't compile on my M3 macOS > machine. After a quick glance, I think the problem is that the > TRY_POPCNT_FAST macro is set, so it's trying to compile the assembly > versions. Fixed, we tried using the existing "choose" logic guarded by TRY_POPCNT_FAST. The latest patch bypasses TRY_POPCNT_FAST by having a separate choose logic for aarch64. -Chiranmoy
v5-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v5-0001-SVE-support-for-popcount-and-popcount-masked.patch