On Thursday, June 21, 2012 04:05:54 PM Florian Pflug wrote: > On Jun21, 2012, at 13:41 , Andres Freund wrote: > > 5.) > > The actually good idea. Yours? > > What about a mixure of (3b) and (4), which writes the data not to the WAL > but to a separate logical replication log. More specifically: > > There's a per-backend queue of change notifications. > > Whenever a non-catalog tuple is modified, we queue a TUPLE_MODIFIED > record containing (xid, databaseoid, tableoid, old xmin, old ctid, new > ctid) > > Whenever a table (or something that a table depends on) is modified we > wait until all references to that table's oid have vanished from the queue, > then queue a DDL record containing (xid, databaseoid, tableoid, text). > Other backend cannot concurrently add further TUPLE_MODIFIED records since > we alreay hold an exclusive lock on the table at that point. > > A background process continually processes these queues. If the front of > the queue is a TUPLE_MODIFIED record, it fetches the old and the new tuple > based on their ctids and writes the old tuple's PK and the full new tuple > to the logical replication log. Since table modifications always wait for > all previously queued TUPLE_MODIFIED records referencing that table to be > processes *before* altering the catalog, tuples can always be interpreted > according to the current (SnapshotNow) catalog contents. > > Upon transaction COMMIT and ROLLBACK, we queue COMMIT and ROLLBACK records, > which are also written to the log by the background process. The background > process may decide to wait until a backend commits before processing that > backend's log. In that case, rolled back transaction don't leave a trace in > the logical replication log. Should a backend, however, issue a DDL > statement, the background process *must* process that backend's queue > immediately, since otherwise there's a dead lock. > > The background process also maintains a value in shared memory which > contains the oldest value in any of the queue's xid or "old xmin" fields. > VACUUM and the like must not remove tuples whose xmin is >= that value. > Hit bits *may* be set for newest tuples though, provided that the > background process ignores hint bits when fetching the old and new tuples. I think thats too complicated to fly. Getting that to recover cleanly in case of crash would mean you'd need another wal.
I think if it comes to that going for 1) is more realistic... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers