On 10/2/14, 9:27 AM, Adam Brusselback wrote:
i've also tried to implement a database versioning using JSON to log changes in 
tables. Here it is: 
https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]]
 
<https://github.com/fxku/audit%5Bhttps://github.com/fxku/audit%5D%5Bhttps://github.com/fxku/audit%5Bhttps://github.com/fxku/audit%5D%5D>
I've got two versioning tables, one storing information about all transactions 
that happened and one where i put the JSON logs of row changes of each table. 
I'm only logging old values and not complete rows.

Then I got a function that recreates a database state at a given time into a 
separate schema - either to VIEWs, MVIEWs or TABLES. This database state could 
then be indexed in order to work with it. You can also reset the production 
state to the recreated past state.

Unfortunately I've got no time to further work on it at the moment + I have not 
done tests with many changes in the database so I can't say if the recreation 
process scales well. On downside I've realised is that using the json_agg 
function has limits when I've got binary data. It gets too long. So I'm really 
looking forward using JSONB.

There are more plans in my mind. By having a Transaction_Log table it should be 
possible to revert only certain transactions. I'm also thinking of parallel 
versioning, e.g. different users are all working with their version of the 
database and commit their changes to the production state. As I've got a unique 
history ID for each table and each row, I should be able to map the affected 
records.
Sorry I'm coming late to this thread. I agree that getting interested people 
together would be a good idea. Is there another mailing list we can do that 
with?

Versioning is also something I've interested in, and have put a lot of thought 
into (if not much actual code :( ). I'll also make some general comments, if I 
may...


I think timestamps should be *heavily avoided* in versioning, because they are frequently the wrong 
way to solve a problem. There are many use cases where you're trying to answer "What values 
were in place when X happened", and the simplest, most fool-proof way to answer that is that 
when you create a record for X, part of that record is a "history ID" that shows you the 
exact data used. For example, if you're creating an invoicing system that has versioning of 
customer addresses you would not try and join an invoice with it's address using a timestamp; you 
would put an actual address_history_id in the invoice table.

I thought I saw a reference to versioning sets of information. This is perhaps 
the trickiest part. You first have to think about the non-versioned sets (ie: a 
customer may have many phone numbers) before you think about versioning the 
set. In this example, you want the history of the *set* of phone numbers, not 
of each individual number. Design it with full duplication of data first, don't 
think about normalizing until you have the full set versioning design.

I understand the generic appeal of using something like JSON, but in reality I 
don't see it working terribly well. It's likely to be on the slow side, and 
it'll also be difficult to query from. Instead, I think it makes more sense to 
create actual history tables that derive their definition from the base table. 
I've got code that extracts information (column_name, data type, nullability) 
from a table (or even a table definition), and it's not that complex. With the 
work that's been done on capturing DDL changes it shouldn't be too hard to 
handle that automatically.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to