Hi John, On Tue, Jun 29, 2021 at 7:13 AM John Naylor <john.nay...@enterprisedb.com> wrote: > I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see > if I can detect any significant differences.
Thanks! > I did a quick check of the MacOS/clang binary size (no debug symbols): > > master: 8108408 > 0001-0009: 8125224 Not too bad. > Later, I'll drill down into the individual patches and see if anything stands > out. > > There were already some comments for v2 upthread about formatting and an > overflow hazard, but I did find a few more things to ask about: Right, here's an update with fixes discussed earlier with Zhihong and Tom: * COMPARE_TYPE -> COMPARE_RET_TYPE * quit fighting with pgindent (I will try to fix this problem generally later) * fix overflow hazard > - For my curiosity, there are a lot of calls to qsort/qunique in the tree -- > without having looked exhaustively, do these patches focus on cases where > there are bespoke comparator functions and/or hot code paths? Patches 0006-0009 are highly specialised for local usage by a single module, and require some kind of evidence that they're worth their bytes, and the onus is on me there of course -- but any ideas and feedback are welcome. There are other opportunities like these, maybe better ones. That reminds me: I recently had a perf report from Andres that showed the qsort in compute_scalar_stats() as quite hot. That's probably a good candidate, and is not yet done in the current patch set. The lower numbered patches are all things that are reused in many places, and in my humble opinion improve the notation and type safety and code deduplication generally when working with common types ItemPtr, BlockNumber, Oid, aside from any performance arguments. At least the ItemPtr stuff *might* also speed something useful up. I tried to measure a speedup in vacuum, but so far I have not. I did learn some things though: While doing that with an uncorrelated index and a lot of deleted tuples, I found that adding more maintenance_work_mem doesn't help beyond a few MB, because then cache misses dominate to the point where it's not better than doing multiple passes (and this is familiar to me from work on hash joins). If I turned on huge pages on Linux and set min_dynamic_shared_memory so that the parallel DSM used by vacuum lives in huge pages, then parallel vacuum with a large maintenance_work_mem starts to do much better than non-parallel vacuum by improving the TLB misses (as with hash joins). I thought that was quite interesting! Perhaps bsearch_itemptr might help with correlated indexes with a lot of deleted indexes (so not dominated by cache misses), though? (I wouldn't be suprised if someone comes up with a much better idea than bsearch for that anyway... a few ideas have been suggested.) > - Aside from the qsort{_arg} precedence, is there a practical reason for > keeping the new global functions in their own files? Better idea for layout welcome. One thing I wondered while trying to figure out where to put functions that operate on itemptr: why is itemptr_encode() in src/include/catalog/index.h?! > - 0002 / 0004 > > +/* Search and unique functions inline in header. */ > > The functions are pretty small, but is there some advantage for inlining > these? Glibc's bsearch definition is already in a header for inlining (as is our qunique), so I thought I should preserve that characteristic on principle. I don't have any evidence though. Other libcs I looked at didn't have bsearch in a header. So by doing this we make the generated code the same across platforms (all other relevant things being equal). I don't know if it really makes much difference, especially since in this case the comparator and size would still be inlined if we defined it in the .c (unlike standard bsearch)... Probably only lazy_tid_reaped() calls it enough to potentially show any difference in a non-microbenchmark workload, if anything does. > - 0003 > > #include "lib/qunique.h" is not needed anymore. Fixed. > This isn't quite relevant for the current patch perhaps, but I'm wondering > why we don't already call bsearch for RelationHasSysCache() and > RelationSupportsSysCache(). Right, I missed that. Done. Nice to delete some more code. > - 0008 > > +#define ST_COMPARE(a, b, cxt) \ > + DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b)) > > This seems like a pretty heavyweight comparison, so I'm not sure inlining > buys us much, but it seems also there are fewer branches this way. I'll come > up with a test and see what happens. I will be very interested to see the results. Thanks!
more-sort-search-specializations-v3.tgz
Description: application/compressed-tar