>We should probably do 
>HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974>
> only after we have a valid entry so that we don't end up with a NULL entry in 
>the cache even if an intermediate error happens. I will share a fix in this 
>thread soon.

I am attaching this patch.






________________________________
From: Sait Talha Nisanci
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hack...@postgresql.org>
Cc: Metin Doslu <metin.do...@microsoft.com>
Subject: Crash in record_type_typmod_compare

Hello,

In citus, we have seen the following crash backtraces because of a NULL 
tupledesc multiple times and we weren't sure if this was related to citus or 
postgres:


#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417     tupdesc.c: No such file or directory.
#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1  0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, 
b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2  0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, 
keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, 
action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at 
dynahash.c:987
#3  0x000000000086a3fd in hash_search (hashp=<optimized out>, 
keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, 
foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4  0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, 
tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5  0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at 
execTuples.c:2056
#6  TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7  0x00007f2701878dee in CreateDistributedExecution 
(modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized 
out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized 
out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at 
executor/adaptive_executor.c:951
#8  0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at 
executor/adaptive_executor.c:676
#9  0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at 
executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at 
../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, 
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, 
operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, 
estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, 
count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, 
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at 
pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, 
forward=forward@entry=true, count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=<optimized out>, dest=dest@entry=0x1abfc90, 
altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at 
pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, 
portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, 
dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at 
postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228

This is the issue: https://github.com/citusdata/citus/issues/3825


I think this is related to postgres because of the following events:

  *   In 
assign_record_type_typmod<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1984>
 tupledesc will be set to NULL if it is not in the cache and it will be set to 
an actual value in this 
line<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1998>.
  *   It is possible that postgres will error in between these two lines, hence 
leaving a NULL tupledesc in the cache. For example in 
find_or_make_matching_shared_tupledesc<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1988>.
 (Possibly because of OOM)
  *   Now there is a NULL tupledesc in the hash table, hence when doing a 
comparison in 
record_type_typmod_compare<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1935>,
 it will crash.

I have manually added a line to error in 
"find_or_make_matching_shared_tupledesc" and I was able to get a similar crash 
with two subsequent simple SELECT queries. You can see the backtrace in the 
issue<https://github.com/citusdata/citus/issues/3825#issuecomment-805627864>.

We should probably do 
HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974>
 only after we have a valid entry so that we don't end up with a NULL entry in 
the cache even if an intermediate error happens. I will share a fix in this 
thread soon.

Best,
Talha.

Attachment: 0001-fix_crash_in_typmod_compare.patch
Description: 0001-fix_crash_in_typmod_compare.patch

Reply via email to