Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-21 Thread Tom Lane
Jeremy Kerr writes: > Rather than testing single bits in a loop, change AllocSetFreeIndex to > use the __builtin_clz() function to calculate the chunk index. Per subsequent discussion and testing, I've committed a patch that uses a lookup table instead of depending on __builtin_clz(). http://arch

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-21 Thread Tom Lane
p...@thetdh.com writes: > Normally I'd try a small lookup table (1-byte index to 1-byte value) > in this case. But if the bitscan instruction were even close in > performance, it'd be preferable, due to its more-reliable caching > behavior; Well, the problem with the bitscan instruction is we

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-21 Thread pg
Normally I'd try a small lookup table (1-byte index to 1-byte value) in this case. But if the bitscan instruction were even close in performance, it'd be preferable, due to its more-reliable caching behavior; it should be possible to capture this at code-configuration time (aligned so as to pro

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-21 Thread Tom Lane
Jeremy Kerr writes: > Thanks for the benchmark app, thought I'd pitch in with some ppc > results: It looks to me like we should go with the lookup table approach, as being the best tradeoff of speed improvement vs platform and compiler independence. The "float hack" is right out ;-) I wonder w

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Jeremy Kerr
Hi Greg, Thanks for the benchmark app, thought I'd pitch in with some ppc results: > clz 1.530s > lookup table 1.720s > float hack 4.424s > unrolled 5.280s > normal 5.369s POWER5+: clz 2.046s lookup table 2.18

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Tom Lane
Greg Stark writes: > Well it was a bit of a pain but I filled an array with (1/1000 scaled > down) values and then permuted them. I also went ahead and set the > low-order bits to random values since the lookup table based algorithm > might be affected by it. > The results are a bit disappointing

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Greg Stark
On Tue, Jul 21, 2009 at 12:07 AM, Tom Lane wrote: > Greg Stark writes: >> I also wonder if this microbenchmark is actually ok because it's >> testing the same value over and over so any branch-prediction will >> shine unrealistically well. > > Yeah, that is a good point --- and it would benefit th

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Tom Lane
Greg Stark writes: > I also wonder if this microbenchmark is actually ok because it's > testing the same value over and over so any branch-prediction will > shine unrealistically well. Yeah, that is a good point --- and it would benefit the unrolled loop more than the other versions. We should p

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Greg Stark
Doh, I forgot this bit. Will rerun tests now. On Mon, Jul 20, 2009 at 8:25 PM, Tom Lane wrote: >  (Note: I found > out that it's necessary to split the test into two files --- otherwise > gcc will inline AllocSetFreeIndex and partially const-fold the work, > leading to skewed results.) -- gre

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Greg Stark
On Mon, Jul 20, 2009 at 8:37 PM, Tom Lane wrote: > Anyone want to see if they can beat that?  Some testing on other > architectures would help too. Hm, I took the three implementations so far (normal, unrolled, and clz) as well as the two from http://graphics.stanford.edu/~seander/bithacks.html#In

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Tom Lane
I wrote: > I'm still interested in the idea of doing a manual unroll instead of > relying on a compiler-specific feature. However, some quick testing > didn't find an unrolling that helps much. Hmm, actually this seems to work ok: idx++; size >>= 1; if (size != 0)

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Tom Lane
Stefan Kaltenbrunner writes: > Tom Lane wrote: >> and it turns out that Intel hasn't seen fit to put a lot of effort into >> the BSR instruction. It's constant time, all right, but on most of >> their CPUs that constant time is like 8 or 16 times slower than an ADD; >> cf http://www.intel.com/Ass

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Stefan Kaltenbrunner
Tom Lane wrote: Jeremy Kerr writes: Rather than testing single bits in a loop, change AllocSetFreeIndex to use the __builtin_clz() function to calculate the chunk index. This requires a new check for __builtin_clz in the configure script. Results in a ~2% performance increase on sysbench

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-20 Thread Tom Lane
Jeremy Kerr writes: > Rather than testing single bits in a loop, change AllocSetFreeIndex to > use the __builtin_clz() function to calculate the chunk index. > This requires a new check for __builtin_clz in the configure script. > Results in a ~2% performance increase on sysbench on PowerPC. I

[HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-19 Thread Jeremy Kerr
Rather than testing single bits in a loop, change AllocSetFreeIndex to use the __builtin_clz() function to calculate the chunk index. This requires a new check for __builtin_clz in the configure script. Results in a ~2% performance increase on sysbench on PowerPC. Signed-off-by: Jeremy Kerr --