Re: Fix for segfault in logical replication on master

Mark Dilger Thu, 17 Jun 2021 08:57:09 -0700


> On Jun 17, 2021, at 6:40 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> 
> I think such a problem won't happen because we are using historic
> snapshots in this context. We rely on that in a similar way in
> reorderbuffer.c, see ReorderBufferProcessTXN.


I think you are right, but that's the part I have trouble fully convincing 
myself is safe.  We certainly have an historic snapshot when we call 
RelationGetIndexList, but that has an early exit if the relation already has 
fields set, and we don't know if those fields were set before or after the 
historic snapshot was taken.  Within the context of the pluggable 
infrastructure, I think we're safe.  The only caller of 
RelationGetIdentityKeyBitmap() in core is logicalrep_write_attrs(), which is 
only called by logicalrep_write_rel(), which is only called by 
send_relation_and_attrs(), which is only called by maybe_send_schema(), which 
is called by pgoutput_change() and pgoutput_truncate(), both being callbacks in 
core's logical replication plugin.

ReorderBufferProcessTXN calls SetupHistoricSnapshot before opening the relation 
and then calling ReorderBufferApplyChange to invoke the plugin on that opened 
relation, so the relation's fields could not have been setup before the 
snapshot was taken.  Any other plugin would similarly get invoked after that 
same logic, so they'd be fine, too.  The problem would only be if somebody 
called RelationGetIdentityKeyBitmap() or one of its calling functions from 
outside that infrastructure.  Is that worth worrying about?  The function 
comments for those mention having an historic snapshot, and the Assert will 
catch if code doesn't have one, but I wonder how much of a trap for the unwary 
that is, considering that somebody might open the relation and lookup indexes 
for the relation before taking an historic snapshot and calling these functions.

I thought it was cheap enough to check that the relation we open is an index, 
because if it is not, we'll segfault when accessing fields of the 
relation->rd_index struct.  I wouldn't necessarily advocate doing any really 
expensive checks here, but a quick sanity check seemed worth the effort.  If 
you don't want to commit that part, I'm not going to put up a huge fuss.

Since neither of you knew why I was performing that check, it is clear that my 
code comment was insufficient.  I have added a more detailed code comment to 
explain the purpose of the check.  I also changed the first check to use 
RelationIsValid(), as suggested upthread.

v2-0001-Fixing-bug-in-logical-replication.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix for segfault in logical replication on master

Reply via email to