On Thu, Apr 15, 2021 at 10:56 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapil...@gmail.com> writes:
> > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japi...@hotmail.com> wrote:
> >>
> >> The RelationIdGetRelation() comment says:
> >>
> > Caller should eventually decrement count. (Usually,
> > that happens by calling RelationClose().)
> >>
> >> However, it doesn't do it in ReorderBufferProcessTXN().
> >> I think we should close it, here is a patch that fixes it. Thoughts?
> >>
>
> > +1. Your fix looks correct to me but can we test it in some way?
>
> I think this code has a bigger problem: it should not be using
> RelationIdGetRelation and RelationClose directly.  99.44% of
> the backend goes through relation_open or one of the other
> relation.c wrappers, so why doesn't this?
>

I think it is because relation_open expects either caller to have a
lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't
need to acquire a lock on relation while decoding changes from WAL
because it uses a historic snapshot to build a relcache entry and all
the later changes to the rel are absorbed while decoding WAL.

I think it is also important to *not* acquire any lock on relation
otherwise it can lead to some sort of deadlock or infinite wait in the
decoding process. Consider a case for operations like Truncate (or if
the user has acquired an exclusive lock on the relation in some other
way say via Lock command) which acquires an exclusive lock on
relation, it won't get replicated in synchronous mode (when
synchronous_standby_name is configured). The truncate operation will
wait for the transaction to be replicated to the subscriber and the
decoding process will wait for the Truncate operation to finish.

> Possibly the answer is "it copied the equally misguided code
> in pgoutput.c".
>

I think it is following what is done during decoding, otherwise, it
will lead to the problems as described above. We are already
discussing one of the similar problems [1] where pgoutput
unintentionally acquired a lock on the index and lead to a sort of
deadlock.

If the above understanding is correct, I think we might want to
improve comments in this area.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.


Reply via email to