Re: Fix for segfault in logical replication on master

Amit Kapila Thu, 17 Jun 2021 20:49:26 -0700

On Thu, Jun 17, 2021 at 9:26 PM Mark Dilger
<[email protected]> wrote:
>
> > On Jun 17, 2021, at 6:40 AM, Amit Kapila <[email protected]> wrote:
> >
> > I think such a problem won't happen because we are using historic
> > snapshots in this context. We rely on that in a similar way in
> > reorderbuffer.c, see ReorderBufferProcessTXN.
>
> I think you are right, but that's the part I have trouble fully convincing 
> myself is safe.  We certainly have an historic snapshot when we call 
> RelationGetIndexList, but that has an early exit if the relation already has 
> fields set, and we don't know if those fields were set before or after the 
> historic snapshot was taken.  Within the context of the pluggable 
> infrastructure, I think we're safe.  The only caller of 
> RelationGetIdentityKeyBitmap() in core is logicalrep_write_attrs(), which is 
> only called by logicalrep_write_rel(), which is only called by 
> send_relation_and_attrs(), which is only called by maybe_send_schema(), which 
> is called by pgoutput_change() and pgoutput_truncate(), both being callbacks 
> in core's logical replication plugin.
>
> ReorderBufferProcessTXN calls SetupHistoricSnapshot before opening the 
> relation and then calling ReorderBufferApplyChange to invoke the plugin on 
> that opened relation, so the relation's fields could not have been setup 
> before the snapshot was taken.  Any other plugin would similarly get invoked 
> after that same logic, so they'd be fine, too.  The problem would only be if 
> somebody called RelationGetIdentityKeyBitmap() or one of its calling 
> functions from outside that infrastructure.  Is that worth worrying about?  
> The function comments for those mention having an historic snapshot, and the 
> Assert will catch if code doesn't have one, but I wonder how much of a trap 
> for the unwary that is, considering that somebody might open the relation and 
> lookup indexes for the relation before taking an historic snapshot and 
> calling these functions.
>


I think in such a case the caller must call InvalidateSystemCaches
before setting up a historic snapshot, otherwise, there could be other
problems as well.

> I thought it was cheap enough to check that the relation we open is an index, 
> because if it is not, we'll segfault when accessing fields of the 
> relation->rd_index struct.  I wouldn't necessarily advocate doing any really 
> expensive checks here, but a quick sanity check seemed worth the effort.
>

I am not telling you anything about the cost of these sanity checks. I
suggest you raise elog rather than return NULL because if this happens
there is definitely some problem and continuing won't be a good idea.

-- 
With Regards,
Amit Kapila.

Re: Fix for segfault in logical replication on master

Reply via email to