On Jun21, 2012, at 13:41 , Andres Freund wrote: > 5.) > The actually good idea. Yours?
What about a mixure of (3b) and (4), which writes the data not to the WAL but to a separate logical replication log. More specifically: There's a per-backend queue of change notifications. Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED record containing (xid, databaseoid, tableoid, old xmin, old ctid, new ctid) Whenever a table (or something that a table depends on) is modified we wait until all references to that table's oid have vanished from the queue, then queue a DDL record containing (xid, databaseoid, tableoid, text). Other backend cannot concurrently add further TUPLE_MODIFIED records since we alreay hold an exclusive lock on the table at that point. A background process continually processes these queues. If the front of the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple based on their ctids and writes the old tuple's PK and the full new tuple to the logical replication log. Since table modifications always wait for all previously queued TUPLE_MODIFIED records referencing that table to be processes *before* altering the catalog, tuples can always be interpreted according to the current (SnapshotNow) catalog contents. Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records, which are also written to the log by the background process. The background process may decide to wait until a backend commits before processing that backend's log. In that case, rolled back transaction don't leave a trace in the logical replication log. Should a backend, however, issue a DDL statement, the background process *must* process that backend's queue immediately, since otherwise there's a dead lock. The background process also maintains a value in shared memory which contains the oldest value in any of the queue's xid or "old xmin" fields. VACUUM and the like must not remove tuples whose xmin is >= that value. Hit bits *may* be set for newest tuples though, provided that the background process ignores hint bits when fetching the old and new tuples. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers