On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös <dign...@inf.unibz.it> wrote: > We would like to contribute to PostgreSQL a solution that supports the query > processing of "at each time point". The basic idea is to offer two new > operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the > ranges of tuples so that subsequent queries can use the usual grouping and > equality conditions to get the intended results.
I just wanted to chime in and say that the work these people have done is *amazing*. I read two of their papers yesterday [1, 2], and if you are interested in temporal data, I encourage you to read them too. The first one is only 12 pages and quite readable. After that the second is easy because it covers a lot of the same ground but adds "scaling" of values when a tuple is split, and some other interesting points. Their contributions could be used to implement SQL:2011 syntax but go way beyond that. Almost every project I work on could use temporal database support, but there is nothing available in the Open Source world. The temporal_tables extension [3] offers transaction-time support, which is great for auditing, but it has no valid-time support (aka application-time or state-time). Same with Magnus Hagander's TARDIS approach [4], Chronomodel [5] (an extension to the Rails ORM), or any other project I've seen. But valid-time is the more valuable dimension, because it tells you the history of the thing itself (not just when the database was changed). Also nothing is even attempting full bitemporal support. The ideas behind temporal data are covered extensively in Snodgrass's 1999 book [6], which shows how valuable it is to handle temporal data in a principled way, rather than ad hoc. But that book also demonstrates how complex the queries become to do things like temporal foreign key constraints and temporal joins. I was sad to learn that his proposed TSQL2 was rejected as a standard back in the 90s, although the critiques by C. J. Date [7] have some merit. In particular, since TSQL2 used *statement* modifiers, some of the behavior was unclear or bad when using subqueries, views, and set-returning functions. It makes more sense to have temporal *operators*, so alongside inner join you have temporal inner join, and likewise with temporal left outer join, temporal union/intersection/difference, temporal aggregation, etc. (I think the drawbacks of TSQL2 came from pursuing an unachievable goal, which was to enable seamlessly converting existing non-temporal tables to temporal without breaking any queries.) Another unsatisfactory approach at historical data, from the industry rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's *Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions) is to ignore the problem and overwrite old data with new. His Type 2 approach (make a new row) is better but loses the continuity between the old row and the new. Type 3 fixes that but supports only one change, not several. And anyway his ideas are tailored to star-schema designs so are not as broadly useful. Workarounds like bridge tables and "put the data in the fact table" are even more wedded to a star-schema approach. But I think his efforts do show how valuable historical data is, and how hard it is to handle without built-in support. As far as I can tell SQL:2011 avoids the statement modifier problem (I'm not 100% sure), but it is quite limited, mostly covering transaction-time semantics and not giving any way to do valid-time outer joins or aggregations. It is clearly an early first step. Unfortunately the syntax feels (to me) crippled by over-specificity, like it will have a hard time growing to support all the things you'd want to do. The research by Dignös et al shows how you can define temporal variants for every operator in the relational algebra, and then implement them by using just two transformations (ALIGN and NORMALIZE) combined with the existing non-temporal operators. It has a strong theoretical basis and avoids the TSQL2 problems with composability. And unlike SQL:2011 it has a great elegance and completeness I haven't seen anywhere else. I believe with range types the approach was to build up useful primitives rather than jumping straight to a less-factored full implementation of temporal features. (This in spite of SQL:2011 choosing to model begin/end times as separate columns, not as ranges. :-) It seems to me the Dignös work follows the same philosophy. Their ALIGN and NORMALIZE could be used to implement SQL:2011 features, but they are also useful for much more. In their papers they actually suggest that these transformations need not be exposed to end-users, although it was convenient to have access to them for their own research. I think it'd be great if Postgres's SQL dialect supported them though, since SQL:2011 leaves out so much. Anyway, I wanted to thank them for their excellent work, their generosity, and also their perseverance. ([1] is from 2012 and was built against Postgres 9.0!) I hope we take their contribution seriously, because it would truly move Postgres's temporal support beyond any database on the market. Yours, Paul [1] https://files.ifi.uzh.ch/boehlen/Papers/modf174-dignoes.pdf [2] http://www.zora.uzh.ch/id/eprint/130374/1/Extending_the_kernel.pdf [3] https://pgxn.org/dist/temporal_tables/ [4] https://www.youtube.com/watch?v=TRgni5q0YM8 [5] https://github.com/ifad/chronomodel [6] *Developing Time-Oriented Database Applications in SQL*, downloadable for free from https://www2.cs.arizona.edu/~rts/publications.html [7] http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CA34414C364C1D859CD0EAE7A714DFF?doi=10.1.1.116.7598&rep=rep1&type=pdf [8] http://www.dsc.ufcg.edu.br/~sampaio/Livros/alph%20Kimball.%20The%20Data%20Warehouse%20Toolkit..%20The%20Complete%20Guide%20to%20Dimensional%20Modelling%20(Wiley,2002)(ISBN%200471200247)(449s).pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers