Hello, I'm interested in contributing some temporal database functionality to Postgres, starting with temporal primary and foreign keys. I know some other folks nearby interested in helping out, too. But before we begin I'd like to ask the community about complying with the SQL:2011 standard [1] for these things.
In SQL:2011, temporal features all build upon PERIODs, which are a new concept you can attach to tables. Each PERIOD is composed of a start column and an end column (both of some date/time type). You define PERIODs when you CREATE TABLE or ALTER TABLE. Then you refer to the periods when you create primary keys or foreign keys to make them temporal. There are also a handful of new operators for testing two ranges for overlap/succession/etc.[2] Most PERIODs are for tracking the history of a *thing* over time, but if the PERIOD is named SYSTEM_TIME it instead tracks the history of changes to *your database*.[3] (Google for "bitemporal" to read more about this.) Personally I think PERIODs are quite disappointing. They are not part of relational theory. They are not a column, but something else. If you say `SELECT * FROM t` you don't get `PERIODs` (as far as I can tell). But you can mention PERIODs approximately wherever you can mention columns [4], so now we have to support them when projecting, selecting, joining, aggregating, etc. (Or if we are permitted to not support them in some of those places, isn't that even worse?) You can see that PERIODs share a lot with Postgres's own range types. But ranges are a real column, requiring no special-case behavior, either for RDBMS implementers or SQL users. They have a richer set of operators.[5] They don't require any special declarations to put them in a table. They aren't limited to just date/time types. You can even define new range types yourself (e.g. I've found it helpful before to define inetrange and floatrange). Also the start/end columns of a PERIOD must be not nullable,[6] so that unbounded ranges must use sentinels like `01 JAN 0000` or `01 JAN 3000` instead. Also there is no way (as far as I can tell) to define and use a period within a subquery or CTE or view. Many of these criticisms of PERIODs you can find in [7], pages 403 - 410 (where "interval" means basically our own range types), plus others: for example PERIODs are always closed/open, you can only have a single application PERIOD per table, they are wordy, etc. I expect that any Postgres implementation of the standard would wind up using ranges internally. For example a temporal primary key would use an exclusion constraint based on a range expression, so if you had a PERIOD defined on columns named `valid_start` and `valid_end`, the PK would use something like `EXCLUDE USING gist (id WITH =, tstzrange(valid_start, valid_end) WITH &&)`. Also the new SQL:2011 operators would be easy to implement on top of our range operators. And then a temporal foreign key implementation would use either those or raw range operators. So is there any way for Postgres to offer the same temporal features, but give users the choice of using either PERIODs or ranges? If we built that, would the community be interested in it? I think there are several possible ways to go about it: 1. Permit defining PERIODs on either a start/end column pair, or an existing range column. Then everything else continues to use PERIODs. This seems tidy to implement, although since it acquiesces to the PERIOD-based approach for temporal functionality, it doesn't solve all the problems above. Also as [9] points out, it would lead to incompatibilities in the new `information_schema` views. E.g. `periods` is supposed to have `start_column_name` and `end_column_name` columns.[8] 2. Permit either ranges or PERIODs in the new syntax, e.g. `PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)` where `valid_at` is either a PERIOD or a range column. Similarly with foreign keys. There is probably some `information_schema` messiness here too, but perhaps less than with #1. This seems like a great alternative to application-time PERIODs, but I'm not sure how you'd tell Postgres to use a range column for the system-time dimension.[3] Perhaps just a function, and then the PERIOD of `SYSTEM_TIME` would call that function (with a range expression). 3. Build our own abstractions on top of ranges, and then use those to implement PERIOD-based features. This is the least clear option, and I imagine it would require a lot more design effort. Our range types are already a step in this direction. Does anyone think this approach has promise? If so I can start thinking about how we'd do it. I imagine we could use a lot of the ideas in [7]. 4. Just give up and follow the standard to the letter. I'm not enthusiastic about this, but I also really want temporal features, so I might still do the work if that's what folks preferred. Left to my own devices I would probably go with a mix of #2 & #3, where temporal functionality is exposed by a layer of public functions that use ranges (maybe accepting PERIODs too), and then implement the PERIOD-based syntax by calling those functions. Using functions for the range-based layer isn't as "strong" an abstraction as designing SQL syntax, so it should be less effort, and also reduce risk of future conflicts. I'd still personally really appreciate *also* doing some #2 though, so that I could access those features via SQL syntax (not functions), but with ranges instead of PERIODs. What do the rest of you think? Also, just how strictly do we have to follow the standard? Requiring sentinels like '01 JAN 3000` just seems so silly. Could Postgres permit nullable start/end PERIOD columns, and give them the same meaning as ranges (unbounded)? Even if I forgot about ranges altogether, I'd sure love to avoid these sentinels. Finally: I know Vik Fearing already made a start at defining PERIODs.[9] I don't know if he's gone any further, but perhaps he can chime in if so. I'd be happy to build on what he's done already. I'm eager to start work on this, but I also want to get some community buy-in before I go too far. Temporal is such a massive set of concepts, I believe it's important to have some discussion before just jumping in. (Btw I've written an annotated bibliography about temporal databases at [10] if anyone wants to read more.) If I can do anything to facilitate a fuller plan, let me know. I can write up a more detailed proposal, etc. Thanks for your feedback! Yours, Paul [1] I'm using the draft docs at https://www.wiscorp.com/SQLStandards.html at the link titled "SQL:20nn Working Draft Documents". Several of the PDFs in that zip file mention the new temporal features, but by far the most important is Part 2 (7IWD2-02-Foundation-2011-12.pdf). If you search for "period" you should find lots of results. [2] 4.14.2: The operators are overlaps, equals, contains, precedes, succeeds, immediately precedes, and immediate succeeds. [3] See 4.14.1 for more about the special PERIOD named SYSTEM_TIME. Whereas application-time PERIODs store a history of a *thing*, the SYSTEM_TIME PERIOD stores a history of changes to the *database* itself. [4] In Part 12 the `<identifier chain>` of SQL syntax is amended to include `PERIODs` as well as columns. [5] https://www.postgresql.org/docs/current/static/functions-range.html [6] 4.6.5.3: "The columns shall both be of a datetime data type and known not nullable." [7] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL. 2nd edition, 2014. [8] 5.38 (7IWD2-11-Schemata-2011-12.pdf). [9] https://www.postgresql-archive.org/Periods-td6022563.html [10] https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/