On 20/12/18 6:53, David Rowley wrote:
Back in 2016 [1] there was some discussion about using the POPCNT
instruction to improve the performance of counting the number of bits
set in a word.  Improving this helps various cases, such as
bms_num_members and also things like counting the allvisible and
frozen pages in the visibility map.

[snip]

I've put together a very rough patch to implement using POPCNT and the
leading and trailing 0-bit instructions to improve the performance of
bms_next_member() and bms_prev_member().  The correct function should
be determined on the first call to each function by way of setting a
function pointer variable to the most suitable supported
implementation.   I've not yet gone through and removed all the
number_of_ones[] arrays to replace with a pg_popcount*() call.

IMVHO: Please do not disregard potential optimization by the compiler around those calls.. o_0  That might explain the reduced performance improvement observed.

Not that I can see any obvious alternative to your implementation right away ...

That
seems to have mostly been done in Thomas' patch [3], part of which
I've used for the visibilitymap.c code changes.  If this patch proves
to be possible, then I'll look at including the other changes Thomas
made in his patch too.

What I'm really looking for by posting now are reasons why we can't do
this. I'm also interested in getting some testing done on older
machines, particularly machines with processors that are from before
2007, both AMD and Intel.

I can offer a 2005-vintage Opteron 2216 rev3 (bought late 2007) to test on. Feel free to toss me some test code.

cpuinfo flags:    fpu de tsc msr pae mce cx8 apic mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good nopl extd_apicid eagerfpu pni cx16 hypervisor lahf_lm cmp_legacy 3dnowprefetch vmmcall

  2007-2008 seems to be around the time both
AMD and Intel added support for POPCNT and LZCNT, going by [4].

Thanks



Reply via email to