Hi,

if it is acceptable for subtransactions to use up transaction numbers,
then here is a half baked RFC for a possible implementation.
If not, forget the rest of this message.

The proposed implementation works much like the current transaction
handling.  It needs an additional system table
pg_subtrans (child XactId PRIMARY KEY, parent XactId).

BEGIN;  -- starts a new (top level) transaction, say 100

INSERT row1;  -- row1.xmin = 100
DELETE row2;  -- row2.xmax = 100

BEGIN;  -- starts a subtransaction, let's call it 200,
        -- stores 100 on the parent transaction stack
        -- (a local memory structure),
        -- inserts (200, 100) into pg_subtrans

INSERT row3;  -- row3.xmin = 200, row3.XMIN_IS_SUB = true

DELETE row4;  -- row4.xmax = 200, row4.XMAX_IS_SUB = true

COMMIT;  -- resets CurrentTransaction to 100 (pop from xact stack),
         -- does *NOT* mark T200 as committed

BEGIN;  -- starts a subtransaction, let's call it 300,
        -- pushes 100 on the parent transaction stack,
        -- inserts (300, 100) into pg_subtrans

BEGIN;  -- starts a 3rd level subtransaction (400),
        -- pushes 300 on the parent transaction stack,
        -- inserts (400, 300) into pg_subtrans

...

COMMIT;  -- resets CurrentTransaction to 300 (transaction stack),
         -- does NOT mark T400 as committed

INSERT row5;  -- row5.xmin = 300, row5.XMIN_IS_SUB = true

DELETE row6;  -- row6.xmax = 300, row6.XMAX_IS_SUB = true

ROLLBACK;  -- resets CurrentTransaction to 100 (transaction stack),
           -- optionally removes (300, 100) from pg_subtrans,
           -- marks T300 as aborted

COMMIT;  -- marks T100 as committed
or
ROLLBACK;  -- marks T100 as aborted


Visibility:
-----------

The checks for xmin and xmax are very similar.  We look at xmin here:

Traditionally a tuple is visible, if xmin has committed before the
current snapshot was taken, or if xmin == CurrentTransaction().

A subtransaction is considered aborted, if it is marked aborted.  Else
it is considered to be in the same state as its parent transaction
(which again can be a subtransaction).

The effects of tup.xmin are considered visible, if ...
(This is not a formal specification.  It shall only illustrate the
difference to the existing implementation of HeapTupleSatisfiesXxx()
in tqual.c)

    if (tup.XMIN_ABORTED)  // flag set by prior visitor
        return false;

    if (tup.XMIN_COMMITTED)  // flag set by prior visitor
        return true;

    // xmin neither known aborted nor known committed,
    // could be active
    // or finished and tup not yet visited
    for (xmin = tup.xmin; IsValid(xmin); xmin = GetParentXact(xmin)) {
        if (TransactionIdDidAbort(xmin)) {
            tup.XMIN_ABORTED = true;
            return false;
        }/*if*/

        if (IsCurrentTransaction(xmin)) {
            // tup.xmin is one of my own subtransactions,
            // it is already committed.  So tup can be
            // considered belonging to the current transaction.
            tup.xmin = xmin;
            tup.XMIN_IS_SUB = CurrentTransactionIsSub();
            return true;  // or rather check cmin ...
        }/*if*/
        
        if (TransactionIdDidCommit(xmin)) {
            // xmin is a top level transaction
            tup.xmin = xmin;
            tup.XMIN_IS_SUB = false;
            tup.XMIN_COMMITTED = true;
            return true;
        }/*if*/

        if (!tup.XMIN_IS_SUB) {
            // Don't try expensive GetParentXact()
            break;
        }/*if*/
    }/*for*/

    // tup.xmin still active
    return false;

TransactionId GetParentXact(TransactionId xnum) uses pg_subtrans to
find the parent transaction of xnum.  It returns InvalidTransaction,
if it doesn't find one.


Performance:
------------

.  Zero overhead, if nested transactions are not used.

.  BEGIN SUB has to insert a pair of TransactionIds into pg_subtrans.
Apart from that it is not slower than BEGIN top level transaction.

.  COMMIT SUB is faster than COMMIT.

.  ROLLBACK SUB is much like ROLLBACK, plus (optionally) deletes one
entry from pg_subtrans.

.  COMMIT and ROLLBACK of top level transactions don't care about
subtransactions.

.  Access a tuple inserted/deleted by a subtransaction:  Zero
overhead, if the subtransaction has been rolled back, otherwise the
parent transaction has to be looked up in pg_subtrans (possibly
recursive).  This price has to be paid only once per tuple (well, once
for xmin and once for xmax).  More accurate: "once after the
inserting/deleting top level transaction has finished".


Problems:
---------

.  pg_subtrans grows by 8 bytes per subtransaction.

.  Other pitfalls???


Administration:
---------------

As soon as a top level transaction has finished, its subtransaction
ids are replaced by the top level transaction id on the next access to
each tuple.

VACUUM (*not* VACUUM tablename) removes old entries from pg_subtrans.
An entry is old, if the parent transaction has finished, before VACUUM
started.


Challenge:
----------

For heavy use of subtransactions there has to be a really fast
implementation of pg_subtrans, maybe something like a b-tree.

AFAICS small WAL changes: pg_subtrans inserts (and deletes?) have to
be logged.

Everything else is straightforward.

Comments?

Servus
 Manfred

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Reply via email to