On Wed, Feb 19, 2020 at 08:16:36PM +0100, Tomas Vondra wrote: > 5) Assert(nbuckets > 0); > ... > This however quickly fails on this assert in BuildTupleHashTableExt (see > backtrace1.txt): > > Assert(nbuckets > 0); > > The value is computed in hash_choose_num_buckets, and there seem to be > no protections against returning bogus values like 0. So maybe we should > return > > Min(nbuckets, 1024) > > or something like that, similarly to hash join. OTOH maybe it's simply > due to agg_refill_hash_table() passing bogus values to the function? > > > 6) Another thing that occurred to me was what happens to grouping sets, > which we can't spill to disk. So I did this: > > create table t2 (a int, b int, c int); > > -- run repeatedly, until there are about 20M rows in t2 (1GB) > with tx as (select array_agg(a) as a, array_agg(b) as b > from (select a, b from t order by random()) foo), > ty as (select array_agg(a) AS a > from (select a from t order by random()) foo) > insert into t2 select unnest(tx.a), unnest(ty.a), unnest(tx.b) > from tx, ty; > > analyze t2; > ... > > which fails with segfault at execution time: > > tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340) > 870 for (i = 0; i < tb->size; i++) > (gdb) bt > #0 tuplehash_start_iterate (tb=0x18, iter=iter@entry=0x2349340) > #1 0x0000000000654e49 in agg_retrieve_hash_table_in_memory ... > > That's not surprising, because 0x18 pointer is obviously bogus. I guess > this is simply an offset 18B added to a NULL pointer?
I did some investigation, have you disabled the assert when this panic happens? If so, it's the same issue as "5) nbucket == 0", which passes a zero size to allocator when creates that endup-with-0x18 hashtable. Sorry my testing env goes weird right now, haven't reproduced it yet. -- Adam Lee