John Naylor <john.nay...@2ndquadrant.com> writes: > v2 had an Assert that was only correct while experimenting with > eliding right shift. Fixed in v3.
I think there must have been something wrong with your test that said that eliminating the right shift from the non-CLZ code made it slower. It should be an unconditional win, just as it is for the CLZ code path. (Maybe some odd cache-line-boundary effect?) Also, I think it's just weird to account for ALLOC_MINBITS one way in the CLZ path and the other way in the other path. I decided that it might be a good idea to do performance testing in-place rather than in a standalone test program. I whipped up the attached that just does a bunch of palloc/pfree cycles. I got the following results on a non-cassert build (medians of a number of tests; the times are repeatable to ~ 0.1% for me): HEAD: 2429.431 ms v3 CLZ: 2131.735 ms v3 non-CLZ: 2477.835 ms remove shift: 2266.755 ms I didn't bother to try this on non-x86_64 architectures, as previous testing convinces me the outcome should be about the same. Hence, pushed that way, with a bit of additional cosmetic foolery: the static assertion made more sense to me in relation to the documented assumption that size <= ALLOC_CHUNK_LIMIT, and I thought the comment could use some work. regards, tom lane
/* create function drive_palloc(count int) returns void strict volatile language c as '.../drive_palloc.so'; \timing select drive_palloc(10000000); */ #include "postgres.h" #include "fmgr.h" #include "miscadmin.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" #include "utils/memutils.h" PG_MODULE_MAGIC; /* * drive_palloc(count int) returns void */ PG_FUNCTION_INFO_V1(drive_palloc); Datum drive_palloc(PG_FUNCTION_ARGS) { int32 count = PG_GETARG_INT32(0); while (count-- > 0) { for (size_t sz = 1; sz <= 8192; sz <<= 1) { void *p = palloc(sz); pfree(p); } CHECK_FOR_INTERRUPTS(); } PG_RETURN_VOID(); }