Hi all, > > I think Nikhils has done some significant work on this patch. > Hopefully he'll be able to share it. >
PFA, latest patch. This builds on top of the last patch submitted by Sokolov Yura and adds the actual logical replication interfaces to allow PREPARE or COMMIT/ROLLBACK PREPARED on a logical subscriber. I tested with latest PG head by setting up PUBLICATION/SUBSCRIPTION for some tables. I tried DML on these tables via 2PC and it seems to work with subscribers honoring COMMIT|ROLLBACK PREPARED commands. Now getting back to the two main issues that we have been discussing: Logical decoding deadlocking/hanging due to locks on catalog tables ==================================================== When we are decoding, we do not hold long term locks on the table. We do RelationIdGetRelation() and RelationClose() which increments/decrements ref counts. Also this ref count is held/released per ReorderBuffer change record. The call to RelationIdGetRelation() holds an AccessShareLock on pg_class, pg_attribute etc. while building the relation descriptor. The plugin itself can access rel/syscache but none of it holds a lock stronger than AccessShareLock on the catalog tables. Even activities like: ALTER user_table; CLUSTER user_table; Do not hold locks that will allow decoding to stall. The only issue could be with locks on catalog objects itself in the prepared transaction. Now if the 2PC transaction is taking an AccessExclusiveLock on catalog objects via "LOCK pg_class" for example, then pretty much nothing else will progress ahead in other sessions in the database till this active session COMMIT PREPAREs or aborts this 2PC transaction. Also, in some cases like CLUSTER on catalog objects, the code explicitly denies preparation of a 2PC transaction. postgres=# BEGIN; postgres=# CLUSTER pg_class using pg_class_oid_index ; postgres=# PREPARE TRANSACTION 'test_prepared_lock'; ERROR: cannot PREPARE a transaction that modified relation mapping This makes sense because we do not want to get into a state where the DB is unable to progress meaningfully at all. Is there any other locking scenario that we need to consider? Otherwise, are we all ok on this point being a non-issue for 2PC logical decoding? Now on to the second issue: 2PC Logical decoding with concurrent "ABORT PREPARED" of the same ========================================================= Before 2PC, we always decoded regular committed transaction records. Now with prepared transactions, we run the risk of running decoding when some other backend could come in and COMMIT PREPARE or ROLLBACK PREPARE simultaneously. If the other backend commits, that's not an issue at all. The issue is with a concurrent rollback of the prepared transaction. We need a way to ensure that the 2PC does not abort when we are in the midst of a change record apply activity. One way to handle this is to ensure that we interlock the abort prepared with an ongoing logical decoding operation for a bounded period of maximum one change record apply cycle. I am outlining one solution but am all ears for better, elegant solutions. * We introduce two new booleans in the TwoPhaseState GlobalTransactionData structure. bool beingdecoded; bool abortpending; 1) Before we start iterating through the change records, if it happens to be a prepared transaction, we check "abortpending" in the corresponding TwoPhaseState entry. If it's not set, then we set "beingdecoded". If abortpending is set, we know that this transaction is going to go away and we treat it like a regular abort and do not do any decoding at all. 2) With "beingdecoded" set, we start with the first change record from the iteration, decode it and apply it. 3) Before starting decode of the next change record, we re-check if "abortpending" is set. If "abortpending" is set, we do not decode the next change record. Thus the abort is delay-bounded to a maximum of one change record decoding/apply cycle after we signal our intent to abort it. Then, we need to send ABORT (regular, not rollback prepared, since we have not sent "PREPARE" yet. We cannot send PREPARE midways because the transaction block on the whole might not be consistent) to the subscriber. We will have to add an ABORT callback in pgoutput for this. There's only a COMMIT callback as of now. The subscribers will ABORT this transaction midways due to this. We can then follow this up with a DUMMY prepared txn. E.g. "BEGIN; PREPARE TRANSACTION 'gid'"; The reasoning for the DUMMY 2PC is mentioned below in (6). 4) Keep decoding change records as long as "abortpending" is not set. 5) At end of the change set, send "PREPARE" to the subscribers and then remove the "beingdecoded" flag from the TwoPhaseState entry. We are now free to commit/rollback the prepared transaction anytime. 6) We will still decode the "ROLLBACK PREPARED" wal entry when it comes to us on the provider. This will call the abort_prepared callback on the subscriber. I have already added this in my patch. This abort_prepared callback will abort the dummy PREPARED query from step (3) above. Instead of doing this, we could actually check if the 'GID' entry exists and then call ROLLBACK PREPARED on the subscriber. But in that case we can't be sure if the GID does not exist because of a rollback-during-decode-issue on the provider or due to something else. If we are ok with not finding GIDs on the subscriber side, then am fine with removing the DUMMY prepare from step (3). 7) When the above activity is happening if another backend wants to abort the prepared transaction then it will set "abortpending". If "beingdecoded" is true, the abort prepared function will wait till it clears out by releasing the lock and re-checking in a few moments. When beingdecoded clears out (which will happen before the next change record apply in walsender when it sees "abortpending" set) , the abort prepare can go ahead as usual. Note that we will have to be careful to clear this "beingdecoded" flag even if the decoding fails or subscription is dropped or any other issues. Then this can work fine, IMO. Thoughts? Holes in the theory? Other issues? I am attaching my latest and greatest WIP patch with does not contain any of the above abort handling yet. Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
2pc_logical_22_11_17.patch
Description: Binary data