On Thu, Apr 15, 2021 at 10:56 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapil...@gmail.com> writes: > > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japi...@hotmail.com> wrote: > >> > >> The RelationIdGetRelation() comment says: > >> > > Caller should eventually decrement count. (Usually, > > that happens by calling RelationClose().) > >> > >> However, it doesn't do it in ReorderBufferProcessTXN(). > >> I think we should close it, here is a patch that fixes it. Thoughts? > >> > > > +1. Your fix looks correct to me but can we test it in some way? > > I think this code has a bigger problem: it should not be using > RelationIdGetRelation and RelationClose directly. 99.44% of > the backend goes through relation_open or one of the other > relation.c wrappers, so why doesn't this? >
I think it is because relation_open expects either caller to have a lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't need to acquire a lock on relation while decoding changes from WAL because it uses a historic snapshot to build a relcache entry and all the later changes to the rel are absorbed while decoding WAL. I think it is also important to *not* acquire any lock on relation otherwise it can lead to some sort of deadlock or infinite wait in the decoding process. Consider a case for operations like Truncate (or if the user has acquired an exclusive lock on the relation in some other way say via Lock command) which acquires an exclusive lock on relation, it won't get replicated in synchronous mode (when synchronous_standby_name is configured). The truncate operation will wait for the transaction to be replicated to the subscriber and the decoding process will wait for the Truncate operation to finish. > Possibly the answer is "it copied the equally misguided code > in pgoutput.c". > I think it is following what is done during decoding, otherwise, it will lead to the problems as described above. We are already discussing one of the similar problems [1] where pgoutput unintentionally acquired a lock on the index and lead to a sort of deadlock. If the above understanding is correct, I think we might want to improve comments in this area. [1] - https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.