On Monday, May 17, 2021 6:45 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > On Fri, May 14, 2021 at 2:20 PM osumi.takami...@fujitsu.com > <osumi.takami...@fujitsu.com> wrote: > > > > On Thursday, May 13, 2021 7:21 PM Amit Kapila > <amit.kapil...@gmail.com> wrote: > > > I don't think we can reproduce it with core plugins as they don't > > > lock user catalog tables. > > OK. My current understanding about how the deadlock happens is below. > > > > 1. TRUNCATE command is performed on user_catalog_table. > > 2. TRUNCATE command locks the table and index with ACCESS > EXCLUSIVE LOCK. > > 3. TRUNCATE waits for the subscriber's synchronization > > when synchronous_standby_names is set. > > 4. Here, the walsender stops, *if* it tries to acquire a lock on the > user_catalog_table > > because the table where it wants to see is locked by the > TRUNCATE already. > > > > If this is right, > > > > Yeah, the above steps are correct, so if we take a lock on user_catalog_table > when walsender is processing the WAL, it would lead to a problem. > > > we need to go back to a little bit higher level discussion, since > > whether we should hack any plugin to simulate the deadlock caused by > > user_catalog_table reference with locking depends on the assumption if > the plugin takes a lock on the user_catalog_table or not. > > In other words, if the plugin does read only access to that type of > > table with no lock (by RelationIdGetRelation for example ?), the > > deadlock concern disappears and we don't need to add anything to plugin > sides, IIUC. > > > > True, if the plugin doesn't acquire any lock on user_catalog_table, then it is > fine but we don't prohibit plugins to acquire locks on user_catalog_tables. > This is similar to system catalogs, the plugins and decoding code do acquire > lock on those. Thanks for sharing this. I'll take the idea that plugin can take a lock on user_catalog_table into account.
> > Here, we haven't gotten any response about whether output plugin takes > > (should take) the lock on the user_catalog_table. But, I would like to > > make a consensus about this point before the implementation. > > > > By the way, Amit-san already mentioned the main reason of this is that > > we allow decoding of TRUNCATE operation for user_catalog_table in > synchronous mode. > > The choices are provided by Amit-san already in the past email in [1]. > > (1) disallow decoding of TRUNCATE operation for user_catalog_tables > > (2) disallow decoding of any operation for user_catalog_tables like > > system catalog tables > > > > Yet, I'm not sure if either option solves the deadlock concern completely. > > If application takes an ACCESS EXCLUSIVE lock by LOCK command (not > by > > TRUNCATE !) on the user_catalog_table in a transaction, and if the > > plugin tries to take a lock on it, I think the deadlock happens. Of > > course, having a consensus that the plugin takes no lock at all would > remove this concern, though. > > > > This is true for system catalogs as well. See the similar report [1] > > > Like this, I'd like to discuss those two items in question together at > > first. > > * the plugin should take a lock on user_catalog_table or not > > * the range of decoding related to user_catalog_table > > > > To me, taking no lock on the user_catalog_table from plugin is fine > > > > We allow taking locks on system catalogs, so why prohibit > user_catalog_tables? However, I agree that if we want plugins to acquire the > lock on user_catalog_tables then we should either prohibit decoding of such > relations or do something else to avoid deadlock hazards. OK. Although we have not concluded the range of logical decoding of user_catalog_table (like we should exclude TRUNCATE command only or all operations on that type of table), I'm worried that disallowing the logical decoding of user_catalog_table produces the deadlock still. It's because disabling it by itself does not affect the lock taken by TRUNCATE command. What I have in mind is an example below. (1) plugin (e.g. pgoutput) is designed to take a lock on user_catalog_table. (2) logical replication is set up in synchronous mode. (3) TRUNCATE command takes an access exclusive lock on the user_catalog_table. (4) This time, we don't do anything for the TRUNCATE decoding. (5) the plugin tries to take a lock on the truncated table but, it can't due to the lock by TRUNCATE command. I was not sure that the place where the plugin takes the lock is in truncate_cb or somewhere else not directly related to decoding of the user_catalog_table itself, so I might be wrong. However, in this case, the solution would be not disabling the decoding of user_catalog_table but prohibiting TRUNCATE command on user_catalog_table in synchronous_mode. If this is true, I need to extend an output plugin and simulate the deadlock first and remove it by fixing the TRUNCATE side. Thoughts ? Best Regards, Takamichi Osumi