Use generation context to speed up tuplesorts

David Rowley Thu, 29 Jul 2021 23:43:06 -0700

Hackers,

While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
sorts for single column sorts), I noticed that we use a separate
memory context to store tuple data and we just reset when we're done
if the sort fits in memory, or we dump the tuples to disk in the same
order they're added and reset the context when it does not.  There is
a little pfree() work going on via writetup_heap() which I think
possibly could just be removed to get some additional gains.


Anyway, this context that stores tuples uses the standard aset.c
allocator which has the usual power of 2 wastage and additional
overheads of freelists etc.  I wondered how much faster it would go if
I set it to use a generation context instead of an aset.c one.

After running make installcheck to make the tenk1 table, running the
attached tuplesortbench script, I get this:

Master:
work_mem = '4MB';
   Sort Method: external merge  Disk: 2496kB
work_mem = '4GB';
   Sort Method: quicksort  Memory: 5541kB

Patched:
work_mem = '4MB';
   Sort Method: quicksort  Memory: 3197kB

So it seems to save quite a bit of memory getting away from rounding
up allocations to the next power of 2.

Performance-wise, there's some pretty good gains. (Results in TPS)

work_mem = '4GB';
Test    master    gen sort    compare
Test1    317.2    665.6    210%
Test2    228.6    388.9    170%
Test3    207.4    330.7    159%
Test4    185.5    279.4    151%
Test5    292.2    563.9    193%

If I drop the work_mem down to standard the unpatched version does to
disk, but the patched version does not.  The gains get a little
bigger.

work_mem = '4MB';
Test    master    gen sort    compare
Test1    177.5    658.2    371%
Test2    149.7    385.2    257%
Test3    137.5    330.0    240%
Test4    129.0    275.1    213%
Test5    161.7    546.4    338%

The patch is just a simple 1-liner at the moment.  I likely do need to
adjust what I'm passing as the blockSize to GenerationContextCreate().
  Maybe a better number would be something that's calculated from
work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
so that we just allocate at most a 16th of work_mem per chunk, but not
bigger than 8MB. I don't think changing this will affect the
performance of the above very much.

David

#!/bin/bash

sec=10

# This should improve
echo "select * from tenk1 order by two offset 1000000" > bench.sql
echo Test1
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by tenthous offset 1000000" > bench.sql
echo Test2
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by stringu1 offset 1000000" > bench.sql
echo Test3
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by stringu2 offset 1000000" > bench.sql
echo Test4
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

# This should improve
echo "select * from tenk1 order by twenty offset 1000000" > bench.sql
echo Test5
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps
pgbench -n -f bench.sql -T $sec regression | grep tps

generation_tuplesort.patch
Description: Binary data

Use generation context to speed up tuplesorts

Reply via email to