On 4/27/15 6:08 PM, Dave Jones wrote:
(Though, I dislike using timestamps to do change/history tracking, but
>that's just me...)
I've been playing around with history tracking (in the context of BI,
typically with batch loaded reporting databases) for about 7-8 years now
and always found timestamps perfect for the purpose, but are you perhaps
referring to using it for audit purposes? If that's the case I'd agree
entirely - this is absolutely the wrong tool for such things (which is
something I need to put a bit more prominently in the docs - it's buried
in the design section at the moment).

Most warehouses dumb things down to a day level, so it's probably OK there.

What I specifically don't like is that using a timestamp to try and determine the order in which something happened is just fraught with gotchas. For starters, now() is locked in when you do a BEGIN, but maybe a newer transaction modifies a table before an older one does. Now the ordering is *backwards*. You have the same problem with using an XID. The only way I've thought of to make this guaranteed safe is to somehow serialize the logging with something like

CREATE TABLE customer_history(
  customer_hid serial primary key -- hid == history_id
  , previous_customer_hid int references customer_history
  , customer_id int NOT NULL references customer
...
);
CREATE UNIQUE INDEX ... ON customer_history(previous_customer_hid) WHERE previous_customer_hid IS NOT NULL; CREATE UNIQUE INDEX ... ON customer_history(customer_hid) WHERE previous_customer_hid IS NULL;

and then have a before trigger enforce
NEW.previous_customer_hid := customer_history__get_latest(customer_id)

where customer_history__get_latest() will 'walk the chain' starting with the first link customer_id = blah AND previous_customer_id = NULL

Because of the indexes that will serialize inserts on a per-customer basis. You could still run into problems with a newer snapshot creating a history record before a transaction with an older snapshot does though. :( Though, if you included txid_current_snapshot() with each record you could probably detect when that happens.

Or did you mean ranges would be better? They certainly looked intriguing
when I started moving this stuff to postgres, and I'd like to re-visit
them in the near future as they offer capabilities I don't have with
timestamps (such as guaranteeing no overlapping ranges via exclusion
constraints) but my initial tests suggested some rather major
performance degradation so I put it on the back-burner at first.

If you're going to keep both a start and end for each record you'd definitely want to do it with a range. If you're only keeping the change time then you can handle it differently.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to