Hi, I got a coredump when using hash join on a Postgres derived Database(Greenplum DB).
And I find a way to reproduce it on Postgres. Root cause: In ExecChooseHashTableSize(), commit b154ee63bb uses func pg_nextpower2_size_t whose param must not be 0. ``` sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size); ``` There is a potential risk that hash_table_bytes < bucket_size in some corner cases. Reproduce sql: ``` --create a wide enough table to reproduce the bug DO language 'plpgsql' $$ DECLARE var_sql text := 'CREATE TABLE t_1600_columns(' || string_agg('field' || i::text || ' varchar(255)', ',') || ');' FROM generate_series(1,1600) As i; BEGIN EXECUTE var_sql; END; $$ ; create table j1(field1 text); set work_mem = 64; set hash_mem_multiplier = 1; set enable_nestloop = off; set enable_mergejoin = off; explain select * from j1 inner join t_1600_columns using(field1); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded ``` Part of core dump file: ``` #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139769161559104) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=139769161559104) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=139769161559104, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f1e8b3de476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f1e8b3c47f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000558cc8884062 in ExceptionalCondition (conditionName=0x558cc8a21570 "num > 0 && num <= PG_UINT64_MAX / 2 + 1", errorType=0x558cc8a21528 "FailedAssertion", fileName=0x558cc8a21500 "../../../src/include/port/pg_bitutils.h", lineNumber=165) at assert.c:69 #6 0x0000558cc843bb16 in pg_nextpower2_64 (num=0) at ../../../src/include/port/pg_bitutils.h:165 #7 0x0000558cc843d13a in ExecChooseHashTableSize (ntuples=100, tupwidth=825086, useskew=true, try_combined_hash_mem=false, parallel_workers=0, space_allowed=0x7ffdcfa01598, numbuckets=0x7ffdcfa01588, numbatches=0x7ffdcfa0158c, num_skew_mcvs=0x7ffdcfa01590) at nodeHash.c:835 ``` This patch fixes it easily: ``` - sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size); + if (hash_table_bytes < bucket_size) + sbuckets = 1; + else + sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size); ``` Or, we could report an error/hit message to tell users to increase work_mem/hash_mem_multiplier. And I think let it work is better. The issue exists on master, 15, 14, 13. Regards, Zhang Mingli
vn-0001-Fix-HashJoin-crash.patch
Description: Binary data