On Tue, 4 Mar 2025 at 20:50, Tomas Vondra <to...@vondra.me> wrote: > > I pushed the two smaller parts today. > > Here's the remaining two parts, to keep cfbot happy. I don't expect to > get these into PG18, though.
As promised on- and off-list, here's the 0001 patch, polished, split, and further adapted for performance. As seen before, it reduces tempspace requirements by up to 50%. I've not tested this against HEAD for performance. It has been split into: 0001: Some API cleanup/changes that creaped into the patch. This removes manual length-passing from the gin tuplesort APIs, instead relying on GinTuple's tuplen field. It's not critical for anything, and could be ignored if so desired. 0002: Tuplesort changes to allow TupleSort users to buffer and merge tuples during the sort operations. The patch was pulled directly from [0] (which was derived from earlier work in this thread), is fairly easy to understand, and has no other moving parts. 0003: Deduplication in tuplesort's flush-to-disk actions, utilizing API introduced with 0002. This improves temporary disk usage by deduplicating data even further, for when there's a lot of duplicated data but the data has enough distinct values to not fit in the available memory. 0004: Use a single tuplesort. This removes the worker-local tuplesort in favor of only storing data in the global one. This mainly reduces the code size and complexity of parallel GIN builds; we already were using that global sort for various tasks. Open questions and open items for this: - I did not yet update the pg_stat_progress systems, nor docs. - Maybe 0003 needs further splitting up, one for the optimizations in GinBuffer, one for the tuplesort buffering. - Maybe we need to trim the buffer in gin's tuplesort flush? - Maybe we should grow the GinBuffer->items array superlinearly rather than to the exact size requirement of the merge operation. Apart from the complexities in 0003, I think the changes are fairly straightforward. I did not include the 0002 of the earlier patch, as it was WIP and its feature explicitly conflicts with my 0004. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/message-id/CAEze2WhRFzd=nvh9YevwiLjrS1j1fP85vjNCXAab=iybz2r...@mail.gmail.com
v20250307-0004-Make-Gin-parallel-builds-use-a-single-tupl.patch
Description: Binary data
v20250307-0002-Allow-tuplesort-implementations-to-buffer-.patch
Description: Binary data
v20250307-0001-Remove-size-argument-from-GIN-tuplesort-in.patch
Description: Binary data
v20250307-0003-Merge-GinTuples-during-tuplesort-before-fl.patch
Description: Binary data