Hi, all

I got a crash when copy partition tables with mass data in Cloudberry 
DB[0](based on Postgres14.4, Greenplum 7).

I have a test on Postgres and it has the similar issue(different places but 
same function).

However it’s a little hard to reproduce because it happened when inserting next 
tuple after a previous copy multi insert buffer is flushed.

To reproduce easily, change the Macros to:

#define MAX_BUFFERED_TUPLES     1
#define MAX_PARTITION_BUFFERS   0

Config and make install, when initdb, a core dump will be as:

#0 0x000055de617211b9 in CopyMultiInsertInfoNextFreeSlot 
(miinfo=0x7ffce496d360, rri=0x55de6368ba88)
 at copyfrom.c:592
#1 0x000055de61721ff1 in CopyFrom (cstate=0x55de63592ce8) at copyfrom.c:985
#2 0x000055de6171dd86 in DoCopy (pstate=0x55de63589e00, stmt=0x55de635347d8, 
stmt_location=0, stmt_len=195,
 processed=0x7ffce496d590) at copy.c:306
#3 0x000055de61ad7ce8 in standard_ProcessUtility (pstmt=0x55de635348a8,
 queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, 
feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM 
E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
 readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 
queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
 qc=0x7ffce496d910) at utility.c:735
#4 0x000055de61ad7614 in ProcessUtility (pstmt=0x55de635348a8,
 queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, 
feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM 
E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
 readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 
queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
 qc=0x7ffce496d910) at utility.c:523
#5 0x000055de61ad5e8f in PortalRunUtility (portal=0x55de633dd7a0, 
pstmt=0x55de635348a8, isTopLevel=true,
 setHoldSnapshot=false, dest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at 
pquery.c:1158
#6 0x000055de61ad6106 in PortalRunMulti (portal=0x55de633dd7a0, 
isTopLevel=true, setHoldSnapshot=false,
 dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, 
qc=0x7ffce496d910) at pquery.c:1315
#7 0x000055de61ad5550 in PortalRun (portal=0x55de633dd7a0, 
count=9223372036854775807, isTopLevel=true,
 run_once=true, dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 
<debugtupDR>, qc=0x7ffce496d910)
 at pquery.c:791```


The root cause is:  we may call CopyMultiInsertInfoFlush() to flush buffer 
during COPY tuples, ex: insert from next tuple,
CopyMultiInsertInfoNextFreeSlot() will get a crash due to null pointer of 
buffer.

To fix it: instead of call CopyMultiInsertInfoSetupBuffer() outside, I put it 
into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.

[0] https://github.com/cloudberrydb/cloudberrydb


Zhang Mingli
www.hashdata.xyz

Attachment: v0-0001-Fix-COPY-FROM-crash-due-to-buffer-flush.patch
Description: Binary data

Reply via email to