Re: PG15 beta1 sort performance regression due to Generation context change

Ranier Vilela Sun, 22 May 2022 12:12:20 -0700

Hi David,

>Over the past few days I've been gathering some benchmark results
>together to show the sort performance improvements in PG15 [1].


>One of the test cases I did was to demonstrate Heikki's change to use
>a k-way merge (65014000b).

>The test I did to try this out was along the lines of:

>set max_parallel_workers_per_gather = 0;
>create table t (a bigint not null, b bigint not null, c bigint not
>null, d bigint not null, e bigint not null, f bigint not null);

>insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; --
10GB!
>vacuum freeze t;

>The query I ran was:

>select * from t order by a offset 140247142;
I redid this test here:
Windows 10 64 bits
msvc 2019 64 bits
RAM 8GB
SSD 256 GB

HEAD (default configuration)
Time: 229396,551 ms (03:49,397)
PATCHED:
Time: 220887,346 ms (03:40,887)

>I tested various sizes of work_mem starting at 4MB and doubled that
>all the way to 16GB. For many of the smaller values of work_mem the
>performance is vastly improved by Heikki's change, however for
>work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
>seconds and PG15 beta 1 took 29 seconds!

>I've been trying to get to the bottom of this today and finally have
>discovered this is due to the tuple size allocations in the sort being
>exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
>to store tuples in sorts) the tuple for the sort would be stored in an
>aset context. After 40af10b57 we'll use a generation context. The
>idea with that change is that the generation context does no
>power-of-2 round ups for allocations, so we save memory in most cases.
>However, due to this particular test having a tuple size of 64-bytes,
>there was no power-of-2 wastage with aset.

>The problem is that generation chunks have a larger chunk header than
>aset do due to having to store the block pointer that the chunk
>belongs to so that GenerationFree() can increment the nfree chunks in
>the block. aset.c does not require this as freed chunks just go onto a
>freelist that's global to the entire context.

>Basically, for my test query, the slowdown is because instead of being
>able to store 620702 tuples per tape over 226 tapes with an aset
>context, we can now only store 576845 tuples per tape resulting in
>requiring 244 tapes when using the generation context.

>If I had added column "g" to make the tuple size 72 bytes causing
>aset's code to round allocations up to 128 bytes and generation.c to
>maintain the 72 bytes then the sort would have stored 385805 tuples
>over 364 batches for aset and 538761 tuples over 261 batches using the
>generation context. That would have been a huge win.

>So it basically looks like I discovered a very bad case that causes a
>significant slowdown. Yet other cases that are not an exact power of
>2 stand to gain significantly from this change.

>One thing 40af10b57 does is stops those terrible performance jumps
>when the tuple size crosses a power-of-2 boundary. The performance
>should be more aligned to the size of the data being sorted now...
>Unfortunately, that seems to mean regressions for large sorts with
>power-of-2 sized tuples.

It seems to me that the solution would be to use aset allocations

when the size of the tuples is power-of-2?

if (state->sortopt & TUPLESORT_ALLOWBOUNDED ||
(state->memtupsize & (state->memtupsize - 1)) == 0)
state->tuplecontext = AllocSetContextCreate(state->sortcontext,
"Caller tuples",  ALLOCSET_DEFAULT_SIZES);
else
state->tuplecontext = GenerationContextCreate(state->sortcontext,
 "Caller tuples",  ALLOCSET_DEFAULT_SIZES);

I took a look and tried some improvements to see if I had a better result.

Would you mind taking a look and testing?

regards,

Ranier Vilela

CREATE TABLE t_test (x numeric);
INSERT INTO t_test SELECT random()
       FROM generate_series(1, 5000000);
ANALYZE;
SHOW work_mem;



HEAD:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=397084.73..883229.71 rows=4166666 width=11) (actual 
time=1326.281..2718.040 rows=5000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=396084.71..401293.04 rows=2083333 width=11) (actual 
time=1287.168..1520.520 rows=1666667 loops=3)
         Sort Key: x
         Sort Method: external merge  Disk: 24880kB
         Worker 0:  Sort Method: external merge  Disk: 24776kB
         Worker 1:  Sort Method: external merge  Disk: 23960kB
         ->  Parallel Seq Scan on t_test  (cost=0.00..47861.33 rows=2083333 
width=11) (actual time=0.241..135.730 rows=1666667 loops=3)
 Planning Time: 0.054 ms
 Execution Time: 2837.789 ms
(11 rows)


PATCHED:
postgres=# explain analyze SELECT * FROM t_test ORDER BY x;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=397084.73..883229.71 rows=4166666 width=11) (actual 
time=1283.818..2696.469 rows=5000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=396084.71..401293.04 rows=2083333 width=11) (actual 
time=1253.459..1485.734 rows=1666667 loops=3)
         Sort Key: x
         Sort Method: external merge  Disk: 25136kB
         Worker 0:  Sort Method: external merge  Disk: 24096kB
         Worker 1:  Sort Method: external merge  Disk: 24304kB
         ->  Parallel Seq Scan on t_test  (cost=0.00..47861.33 rows=2083333 
width=11) (actual time=0.256..133.065 rows=1666667 loops=3)
 Planning Time: 0.055 ms
 Execution Time: 2816.495 ms
(11 rows)




David's benchmark:

set max_parallel_workers_per_gather = 0;
create table t (a bigint not null, b bigint not null, c bigint not
null, d bigint not null, e bigint not null, f bigint not null);


insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; -- 10GB!
vacuum freeze t;

select * from t order by a offset 140247142;


HEAD:
postgres=# select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)


Time: 229396,551 ms (03:49,397)
Time: 231432,750 ms (03:51,433)


PATCHED:
postgres=# explain analyze select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)


Time: 220887,346 ms (03:40,887)
Time: 222652,487 ms (03:42,652)





Ronan's benchmark:

Setup 1: single table, 1 000 000 tuples, no index
CREATE TABLE tbench (
  a int,
  b int
);


INSERT INTO tbench (a, b) SELECT a, b FROM generate_series(1, 100) a, 
generate_series(1, 10000) b;


Test 1: Single-column ordered select (order by b since the table is already 
sorted by a)
select b from tbench order by b;


HEAD:

postgres=# explain analyze select b from tbench order by b;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=266422.45..271422.45 rows=2000000 width=4) (actual 
time=425.062..559.464 rows=2000000 loops=1)
   Sort Key: b
   Sort Method: external merge  Disk: 23528kB
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.056..168.422 rows=2000000 loops=1)
 Planning Time: 0.058 ms
 Execution Time: 605.315 ms
(6 rows)


PATCHED:

postgres=# explain analyze select b from tbench order by b;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=266422.45..271422.45 rows=2000000 width=4) (actual 
time=421.642..557.750 rows=2000000 loops=1)
   Sort Key: b
   Sort Method: external merge  Disk: 23528kB
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.047..167.848 rows=2000000 loops=1)
 Planning Time: 0.080 ms
 Execution Time: 603.583 ms
(6 rows)




Test 2: Ordered sum (using b so that the input is not presorted)
select sum(b order by b) from tbench;



HEAD:
postgres=# explain analyze select sum(b order by b) from tbench;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33850.00..33850.01 rows=1 width=8) (actual 
time=547.316..547.317 rows=1 loops=1)
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.074..134.147 rows=2000000 loops=1)
 Planning Time: 0.060 ms
 Execution Time: 547.339 ms
(4 rows)


PATCHED:
postgres=# explain analyze select sum(b order by b) from tbench;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33850.00..33850.01 rows=1 width=8) (actual 
time=544.565..544.566 rows=1 loops=1)
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.044..134.305 rows=2000000 loops=1)
 Planning Time: 0.050 ms
 Execution Time: 544.633 ms
(4 rows)




Test 3: Ordered sum + group by
select b, sum(a order by a) from tbench GROUP BY b;


HEAD:

postgres=# explain analyze select b, sum(a order by a) from tbench GROUP BY b;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=266422.45..281522.48 rows=10003 width=12) (actual 
time=740.557..1201.470 rows=10000 loops=1)
   Group Key: b
   ->  Sort  (cost=266422.45..271422.45 rows=2000000 width=8) (actual 
time=740.490..905.244 rows=2000000 loops=1)
         Sort Key: b
         Sort Method: external merge  Disk: 35216kB
         ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=8) 
(actual time=0.040..192.818 rows=2000000 loops=1)
 Planning Time: 0.080 ms
 Execution Time: 1206.248 ms
(8 rows)


PATCHED:

postgres=# explain analyze select b, sum(a order by a) from tbench GROUP BY b;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=266422.45..281522.48 rows=10003 width=12) (actual 
time=744.886..1196.399 rows=10000 loops=1)
   Group Key: b
   ->  Sort  (cost=266422.45..271422.45 rows=2000000 width=8) (actual 
time=744.822..906.223 rows=2000000 loops=1)
         Sort Key: b
         Sort Method: external merge  Disk: 35216kB
         ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=8) 
(actual time=0.053..198.255 rows=2000000 loops=1)
 Planning Time: 0.084 ms
 Execution Time: 1200.807 ms
(8 rows)



Setup 2: same as before, but adding an index on (b, a)

CREATE INDEX ON tbench (b, a);


Test 2: Ordered sum:
select sum(a order by a) from tbench;


HEAD:

postgres=# explain analyze select sum(a order by a) from tbench;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33850.00..33850.01 rows=1 width=8) (actual 
time=417.679..417.679 rows=1 loops=1)
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.036..134.878 rows=2000000 loops=1)
 Planning Time: 1.185 ms
 Execution Time: 417.732 ms
(4 rows)


PATCHED:

postgres=# explain analyze select sum(a order by a) from tbench;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33850.00..33850.01 rows=1 width=8) (actual 
time=421.826..421.827 rows=1 loops=1)
   ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=4) (actual 
time=0.063..134.843 rows=2000000 loops=1)
 Planning Time: 0.070 ms
 Execution Time: 421.885 ms
(4 rows)



Test 3: Ordered sum + group by:
select a, sum(b order by b) from tbench GROUP BY a;


HEAD:

postgres=# explain analyze select a, sum(b order by b) from tbench GROUP BY a;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=199979.56..214980.56 rows=100 width=12) (actual 
time=502.326..1048.130 rows=100 loops=1)
   Group Key: a
   ->  Sort  (cost=199979.56..204979.56 rows=2000000 width=8) (actual 
time=496.952..654.468 rows=2000000 loops=1)
         Sort Key: a
         Sort Method: external merge  Disk: 35216kB
         ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=8) 
(actual time=0.064..136.186 rows=2000000 loops=1)
 Planning Time: 0.137 ms
 Execution Time: 1052.026 ms
(8 rows)


PATCHED:

postgres=# explain analyze select a, sum(b order by b) from tbench GROUP BY a;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=199979.56..214980.56 rows=100 width=12) (actual 
time=485.994..1033.928 rows=100 loops=1)
   Group Key: a
   ->  Sort  (cost=199979.56..204979.56 rows=2000000 width=8) (actual 
time=480.617..637.878 rows=2000000 loops=1)
         Sort Key: a
         Sort Method: external merge  Disk: 35216kB
         ->  Seq Scan on tbench  (cost=0.00..28850.00 rows=2000000 width=8) 
(actual time=0.066..136.371 rows=2000000 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 1037.586 ms
(8 rows)

002-improve-sort.patch
Description: Binary data

Re: PG15 beta1 sort performance regression due to Generation context change

Reply via email to