Re: Bug in huge simplehash

David Rowley Thu, 12 Aug 2021 21:51:24 -0700

On Wed, 11 Aug 2021 at 00:10, Yura Sokolov <y.soko...@postgrespro.ru> wrote:
> Attached v2.


Eyeballing this it looks fine, but I was a little nervous backpatching
without testing it properly, so I ended up provisioning a machine with
256GB and doing a round of testing.

I just created the most simple table I could:

create table a (a bigserial, b int);
and inserted 2^31 rows.

insert into a (b) values(1);
insert into a (b) select b from a; -- repeated until I got 2^31 rows.

set work_mem = '256GB';
set max_parallel_workers_per_gather = 0;

I could recreate the issue described with the following query:

explain (analyze , timing off) select a from a group by a;

After watching perf top for a while it switched to:

  98.90%  postgres            [.] tuplehash_grow
   0.36%  [kernel]            [k] change_p4d_range
   0.24%  postgres            [.] LookupTupleHashEntry
   0.09%  postgres            [.] tts_minimal_store_tuple
   0.07%  [kernel]            [k] vm_normal_page
   0.02%  [kernel]            [k] __softirqentry_text_start
   0.02%  postgres            [.] heap_fill_tuple
   0.02%  postgres            [.] AllocSetAlloc

After patching I got:

explain (analyze , timing off) select a from a group by a;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=35149810.71..53983243.28 rows=1883343257
width=8) (actual rows=2147483648 loops=1)
   Group Key: a
   Batches: 1  Memory Usage: 201334801kB
   ->  Seq Scan on a  (cost=0.00..30441452.57 rows=1883343257 width=8)
(actual rows=2147483648 loops=1)
 Planning Time: 0.105 ms
 Execution Time: 2173480.905 ms
(6 rows)
Time: 2173482.166 ms (36:13.482)

And, since I only had 256GB of memory on this machine and couldn't
really do 2^32 groups, I dropped SH_FILLFACTOR to 0.4 and
SH_MAX_FILLFACTOR to 0.48 and tried again to ensure I got the hash
table full message:

postgres=# explain (analyze on , timing off) select a from a group by a;
ERROR:  hash table size exceeded
Time: 1148554.672 ms (19:08.555)

After doing that, I felt a bit better about batch-patching it, so I did.

Thanks for the patch.

David

Re: Bug in huge simplehash

Reply via email to