On 30/03/18 09:56, Nikhil Sontakke wrote: > >> I as wondering how to hide this. Best idea I had so far would be to put >> it in heap_beginscan (and index_beginscan given that catalog scans use >> is as well) behind some condition. That would also improve performance >> because locking would not need to happen for syscache hits. The problem >> is however how to inform the heap_beginscan about the fact that we are >> in 2PC decoding. We definitely don't want to change all the scan apis >> for this. I wonder if we could add some kind of property to Snapshot >> which would indicate this fact - logical decoding is using it's own >> snapshots it could inject the information about being inside the 2PC >> decoding. >> > > The idea of adding that info in the Snapshot itself is interesting. We > could introduce a logicalxid field in SnapshotData to point to the XID > that the decoding backend is interested in. This could be added only > for the 2PC case. Support in the future for in-progress transactions > could use this field as well. If it's a valid XID, we could call > LogicalLockTransaction/LogicalUnlockTransaction on that XID from > heap_beginscan/head_endscan respectively. I can also look at what > other *_beginscan APIs would need this as well. >
So I have spent some significant time today thinking about this (the issue in general not this specific idea). And I think this proposal does not work either. The problem is that we fundamentally want two things, not one. It's true we want to block ABORT from finishing while we are reading catalogs, but the other important part is that we want to bail gracefully when ABORT happened for the transaction being decoded. In other words,, if we do the locking transparently somewhere in the scan or catalog read or similar there is no way to let the plugin know that it should bail. So the locking code that's called from several layers deep would have only one option, to ERROR. I don't think we want to throw ERRORs when transaction which is being decoded has been aborted as that disrupts the replication. I think that we basically only have two options here that can satisfy both blocking ABORT and bailing gracefully in case ABORT has happened. Either the plugin has full control over locking (as in the patch), so that it can bail when the locking function reports that the transaction has aborted. Or we do the locking around the plugin calls, ie directly in logical decoding callback wrappers or similar. Both of these options have some disadvantages. Locking inside plugin make the plugin code much more complex if it wants to support this. For example if I as plugin author call any function that somewhere access syscache, I have to do the locking around that function call. Locking around plugin callbacks can hold he lock for longer periods of time since plugins usually end up writing to network. I think for most use-cases of 2PC decoding the latter is more useful as plugin should be connected to some kind transaction management solution. Also the time should be bounded by things like wal_sender_timeout (or statement_timeout for SQL variant of decoding). Note that I was initially advocating against locking around whole callbacks when Nikhil originally came up with the idea, but after we went over several other options here and given it a lot of thought I now think it's probably least bad way we have available. At least until somebody figures out how to solve all the issues around reading aborted catalog changes, but that does seem like rather large project on its own. And if we do locking around plugin callbacks now then we can easily switch to that solution if it ever happens without anybody having to rewrite the plugins. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services