On Wed, 1 Jun 2022 at 03:09, Tom Lane <t...@sss.pgh.pa.us> wrote: > Right now my vote would be to leave things as they stand for v15 --- > the performance loss that started this thread occurs in a narrow > enough set of circumstances that I don't feel too much angst about > it being the price of winning in most other circumstances. We can > investigate these options at leisure for v16 or later.
I've been hesitating a little to put my views here as I wanted to see what the other views were first. My thoughts are generally in agreement with you, i.e., to do nothing for PG15 about this. My reasoning is: 1. Most cases are faster as a result of using generation contexts for sorting. 2. The slowdown cases seem rare and the speedup cases are much more common. 3. There were performance cliffs in PG14 if a column was added to a table to make the tuple size cross a power-of-2 boundary which I don't recall anyone complaining about. PG15 makes the performance drop more gradual as tuple sizes increase. Performance is more predictable as a result. 4. As I just demonstrated in [1], if anyone is caught by this and has a problem, the work_mem size increase required seems very small to get performance back to better than in PG14. I found that setting work_mem to 64.3MB makes PG15 faster than PG14 for the problem case. If anyone happened to hit this case and find the performance regression unacceptable then they have a way out... increase work_mem a little. Also, in terms of what we might do to improve this situation for PG16: I was also discussing this off-list with Andres which resulted in him prototyping a patch [2] to store the memory context type in 3-bits in the 64-bits prior to the pointer which is used to lookup a memory context method table so that we can call the correct function. I've been hacking around with this and I've added some optimisations and got the memory allocation test [3] (modified to use aset.c rather than generation.c) showing very promising results when comparing this patch to master. There are still a few slowdowns, but 16-byte allocations up to 256-bytes allocations are looking pretty good. Up to ~10% faster compared to master. (lower is better) size compare 8 114.86% 16 89.04% 32 90.95% 64 94.17% 128 93.36% 256 96.57% 512 101.25% 1024 109.88% 2048 100.87% There's quite a bit more work to do for deciding how to handle large allocations and there's also likely more than can be done to further shrink the existing chunk headers for each of the 3 existing memory allocators. David [1] https://www.postgresql.org/message-id/CAApHDvq8MoEMxHN+f=rccfwcfr30an1w3uokruunnplvrr3...@mail.gmail.com [2] https://github.com/anarazel/postgres/tree/mctx-chunk [3] https://www.postgresql.org/message-id/attachment/134021/allocate_performance_function.patch