On 5/23/2010 8:38 PM, Robert Haas wrote:
On Sun, May 23, 2010 at 4:21 PM, Jan Wieck <janwi...@yahoo.com> wrote:
The system will have postgresql.conf options for enabling/disabling the
whole shebang, how many shared buffers to allocate for managing access
to the data and to define the retention period of the data based on data
volume and/or age of the commit records.

It would be nice if this could just be managed out of shared_buffers
rather than needing to configure a separate pool just for this
feature.  But, I'm not sure how much work that is, and if it turns out
to be too ugly then I'd say it's not a hard requirement.  In general,
I think we talked during the meeting about the desirability of folding
specific pools into shared_buffers rather than managing them
separately, but I'm not aware that we have any cases where we do that
today so it might be hard (or not).

I'm not sure the retention policies of the shared buffer cache, the WAL buffers, CLOG buffers and every other thing we try to cache are that easy to fold into one single set of logic. But I'm all ears.


Each record of the Transaction Commit Info consists of

    txid          xci_transaction_id
    timestamptz   xci_begin_timestamp
    timestamptz   xci_commit_timestamp
    int64         xci_total_rowcount

32 bytes total.

Are we sure it's worth including the row count?  I wonder if we ought
to leave that out and let individual clients of the mechanism track
that if they're so inclined, especially since it won't be reliable
anyway.

Nope, we (my belly and I) are not sure about the absolute worth of the row count. It would be a convenient number to have there, but I can live without it.


CommitTransaction() inside of xact.c will call a function, that inserts
a new record into this array. The operation will for most of the time be
nothing than taking a spinlock and adding the record to shared memory.
All the data for the record is readily available, does not require
further locking and can be collected locally before taking the spinlock.

What happens when you need to switch pages?

Then the code will have to grab another free buffer or evict one.


The function will return the "sequence" number which CommitTransaction()
in turn will record in the WAL commit record together with the
begin_timestamp. While both, the begin as well as the commit timestamp
are crucial to determine what data a particular transaction should have
seen, the row count is not and will not be recorded in WAL.

It would certainly be better if we didn't to bloat the commit xlog
records to do this.  Is there any way to avoid that?

If you can tell me how a crash recovering system can figure out what the exact "sequence" number of the WAL commit record at hand should be, let's rip it.


Checkpoint handling will call a function to flush the shared buffers.
Together with this, the information from WAL records will be sufficient
to recover this data (except for row counts) during crash recovery.

Right.

Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.

Purging of the data will be possible in several different ways.
Autovacuum will call a function that drops segments of the data that are
 outside the postgresql.conf configuration with respect to maximum age
or data volume. There will also be a function reserved for superusers to
explicitly purge the data up to a certain serial number.

Dunno if autovacuuming this is the right way to go.  Seems like that
could leave to replication breaks, and it's also more work than not
doing that.  I'd just say that if you turn this on you're responsible
for pruning it, full stop.

It is an option. "Keep it until I tell you" is a perfectly valid configuration option. One you probably don't want to forget about, but valid none the less.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to