> Hm.  Any idea why that is?  I wonder if the compiler isn't using as many
> SVE registers as it could for this.

Not sure, we tried forcing loop unrolling using the below line in the MakeFile
but the results are the same.

pg_popcount_sve.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} -march=native


> I've also noticed that the latest patch doesn't compile on my M3 macOS
> machine.  After a quick glance, I think the problem is that the
> TRY_POPCNT_FAST macro is set, so it's trying to compile the assembly
> versions.

Fixed, we tried using the existing "choose" logic guarded by TRY_POPCNT_FAST.
The latest patch bypasses TRY_POPCNT_FAST by having a separate choose logic
for aarch64.


-Chiranmoy

Attachment: v5-0001-SVE-support-for-popcount-and-popcount-masked.patch
Description: v5-0001-SVE-support-for-popcount-and-popcount-masked.patch

Reply via email to