On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian <[EMAIL PROTECTED]> wrote: >Yes, locking is one possible solution, but no one likes that. One hack >lock idea would be to create a subtransaction-only lock, [...] > >> [...] without >> having to touch the xids in the tuple headers. > >Yes, you could do that, but we can easily just set the clog bits >atomically,
>From what I read above I don't think we can *easily* set more than one transaction's bits atomically. > and it will not be needed --- the tuple bits really don't >help us, I think. Yes, this is what I said, or at least tried to say. I just wanted to make clear how this new approach (use the fourth status) differs from older proposals (replace subtransaction ids in tuple headers). >OK, we put it in a file. And how do we efficiently clean it up? >Remember, it is only to be used for a _brief_ period of time. I think a >file system solution is doable if we can figure out a way not to create >a file for every xid. I don't want to create one file for every transaction, but rather a huge (sparse) array of parent xids. This array is divided into manageable chunks, represented by files, "pg_subtrans_NNNN". These files are only created when necessary. At any time only a tiny part of the whole array is kept in shared buffers. This concept is similar or almost equal to pg_clog, which is an array of doublebits. >Maybe we write the xid's to a file in a special directory in sorted >order, and backends can do a btree search of each file in that directory >looking for the xid, and then knowing the master xid, look up that >status, and once all the children xid's are updated, you delete the >file. Yes, dense arrays or btrees are other possible implementations. But for simplicity I'd do it pg_clog style. >Yes, but again, the xid status of subtransactions is only update just >before commit of the main transaction, so there is little value to >having those visible. Having them visible solves the atomicity problem without requiring long locks. Updating the status of a single (main or sub) transaction is atomic, just like it is now. Here is what is to be done for some operations: BEGIN main transaction: Get a new xid (no change to current behaviour). pg_clog[xid] is still 00, meaning active. pg_subtrans[xid] is still 0, meaning no parent. BEGIN subtransaction: Push current transaction info onto local stack. Get a new xid. Record parent xid in pg_subtrans[xid]. pg_clog[xid] is still 00. ROLLBACK subtransaction: Set pg_clog[xid] to 10 (aborted). Optionally set clog bits for subsubtransactions to 10. Pop transaction info from stack. COMMIT subtransaction: Set pg_clog[xid] to 11 (committed subtrans). Don't touch clog bits for subsubtransactions! Pop transaction info from stack. ROLLBACK main transaction: Set pg_clog[xid] to 10 (aborted). Optionally set clog bits for subtransactions to 10. COMMIT main transaction: Set pg_clog[xid] to 01 (committed). Optionally set clog bits for subtransactions from 11 to 01. Don't touch clog bits for aborted subtransactions! Visibility check by other transactions: If a tuple is visited and its XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to be consulted to find out the status of the inserting/deleting transaction xid. If pg_clog[xid] is ... 00: transaction still active 10: aborted 01: committed 11: committed subtransaction, have to check parent Only in this last case do we have to get parentxid from pg_subtrans. Now we look at pg_clog[parentxid]. If we find ... 00: parent still active, so xid is considered active, too 10: parent aborted, so xid is considered aborted, optionally set pg_clog[xid] = 10 01: parent committed, so xid is considered committed, optionally set pg_clog[xid] = 01 11: recursively check grandparent(s) ... For brevity the following operations are not covered in detail: . Visibility checks for tuples inserted/deleted by a (sub)transaction belonging to the current transaction tree (have to check local transaction stack whenever we look at a xid or switch to a parent xid) . HeapTupleSatisfiesUpdate (sometimes has to wait for parent transaction) The trick here is, that subtransaction status is immediately updated in pg_clog on commit/abort. Main transaction commit is atomic (just set its commit bit). Status 11 is short-lived, it is replaced with the final status by one or more of - COMMIT/ROLLBACK of the main transaction - a later visibility check (as a side effect) - VACUUM pg_subtrans cleanup: A pg_subtrans_NNNN file covers a known range of transaction ids. As soon as none of these transactions has a pg_clog status of 11, the pg_subtrans_NNNN file can be removed. VACUUM can do this, and it won't even have to check the heap. Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html