SQL:2011 PERIODS vs Postgres Ranges?

Paul A Jungwirth Sun, 21 Oct 2018 11:18:36 -0700

Hello,

I'm interested in contributing some temporal database functionality to
Postgres, starting with temporal primary and foreign keys. I know some
other folks nearby interested in helping out, too. But before we begin
I'd like to ask the community about complying with the SQL:2011
standard [1] for these things.


In SQL:2011, temporal features all build upon PERIODs, which are a new
concept you can attach to tables. Each PERIOD is composed of a start
column and an end column (both of some date/time type). You define
PERIODs when you CREATE TABLE or ALTER TABLE. Then you refer to the
periods when you create primary keys or foreign keys to make them
temporal. There are also a handful of new operators for testing two
ranges for overlap/succession/etc.[2] Most PERIODs are for tracking
the history of a *thing* over time, but if the PERIOD is named
SYSTEM_TIME it instead tracks the history of changes to *your
database*.[3] (Google for "bitemporal" to read more about this.)

Personally I think PERIODs are quite disappointing. They are not part
of relational theory. They are not a column, but something else. If
you say `SELECT * FROM t` you don't get `PERIODs` (as far as I can
tell). But you can mention PERIODs approximately wherever you can
mention columns [4], so now we have to support them when projecting,
selecting, joining, aggregating, etc. (Or if we are permitted to not
support them in some of those places, isn't that even worse?)

You can see that PERIODs share a lot with Postgres's own range types.
But ranges are a real column, requiring no special-case behavior,
either for RDBMS implementers or SQL users. They have a richer set of
operators.[5] They don't require any special declarations to put them
in a table. They aren't limited to just date/time types. You can even
define new range types yourself (e.g. I've found it helpful before to
define inetrange and floatrange). Also the start/end columns of a
PERIOD must be not nullable,[6] so that unbounded ranges must use
sentinels like `01 JAN 0000` or `01 JAN 3000` instead. Also there is
no way (as far as I can tell) to define and use a period within a
subquery or CTE or view. Many of these criticisms of PERIODs you can
find in [7], pages 403 - 410 (where "interval" means basically our own
range types), plus others: for example PERIODs are always closed/open,
you can only have a single application PERIOD per table, they are
wordy, etc.

I expect that any Postgres implementation of the standard would wind
up using ranges internally. For example a temporal primary key would
use an exclusion constraint based on a range expression, so if you had
a PERIOD defined on columns named `valid_start` and `valid_end`, the
PK would use something like `EXCLUDE USING gist (id WITH =,
tstzrange(valid_start, valid_end) WITH &&)`. Also the new SQL:2011
operators would be easy to implement on top of our range operators.
And then a temporal foreign key implementation would use either those
or raw range operators.

So is there any way for Postgres to offer the same temporal features,
but give users the choice of using either PERIODs or ranges? If we
built that, would the community be interested in it? I think there are
several possible ways to go about it:

1. Permit defining PERIODs on either a start/end column pair, or an
existing range column. Then everything else continues to use PERIODs.
This seems tidy to implement, although since it acquiesces to the
PERIOD-based approach for temporal functionality, it doesn't solve all
the problems above. Also as [9] points out, it would lead to
incompatibilities in the new `information_schema` views. E.g.
`periods` is supposed to have `start_column_name` and
`end_column_name` columns.[8]

2. Permit either ranges or PERIODs in the new syntax, e.g. `PRIMARY
KEY (id, valid_at WITHOUT OVERLAPS)` where `valid_at` is either a
PERIOD or a range column. Similarly with foreign keys. There is
probably some `information_schema` messiness here too, but perhaps
less than with #1. This seems like a great alternative to
application-time PERIODs, but I'm not sure how you'd tell Postgres to
use a range column for the system-time dimension.[3] Perhaps just a
function, and then the PERIOD of `SYSTEM_TIME` would call that
function (with a range expression).

3. Build our own abstractions on top of ranges, and then use those to
implement PERIOD-based features. This is the least clear option, and I
imagine it would require a lot more design effort. Our range types are
already a step in this direction. Does anyone think this approach has
promise? If so I can start thinking about how we'd do it. I imagine we
could use a lot of the ideas in [7].

4. Just give up and follow the standard to the letter. I'm not
enthusiastic about this, but I also really want temporal features, so
I might still do the work if that's what folks preferred.

Left to my own devices I would probably go with a mix of #2 & #3,
where temporal functionality is exposed by a layer of public functions
that use ranges (maybe accepting PERIODs too), and then implement the
PERIOD-based syntax by calling those functions. Using functions for
the range-based layer isn't as "strong" an abstraction as designing
SQL syntax, so it should be less effort, and also reduce risk of
future conflicts. I'd still personally really appreciate *also* doing
some #2 though, so that I could access those features via SQL syntax
(not functions), but with ranges instead of PERIODs. What do the rest
of you think?

Also, just how strictly do we have to follow the standard? Requiring
sentinels like '01 JAN 3000` just seems so silly. Could Postgres
permit nullable start/end PERIOD columns, and give them the same
meaning as ranges (unbounded)? Even if I forgot about ranges
altogether, I'd sure love to avoid these sentinels.

Finally: I know Vik Fearing already made a start at defining
PERIODs.[9] I don't know if he's gone any further, but perhaps he can
chime in if so. I'd be happy to build on what he's done already.

I'm eager to start work on this, but I also want to get some community
buy-in before I go too far. Temporal is such a massive set of
concepts, I believe it's important to have some discussion before just
jumping in. (Btw I've written an annotated bibliography about temporal
databases at [10] if anyone wants to read more.) If I can do anything
to facilitate a fuller plan, let me know. I can write up a more
detailed proposal, etc. Thanks for your feedback!

Yours,
Paul

[1] I'm using the draft docs at
https://www.wiscorp.com/SQLStandards.html at the link titled "SQL:20nn
Working Draft Documents". Several of the PDFs in that zip file mention
the new temporal features, but by far the most important is Part 2
(7IWD2-02-Foundation-2011-12.pdf). If you search for "period" you
should find lots of results.

[2] 4.14.2: The operators are overlaps, equals, contains, precedes,
succeeds, immediately precedes, and immediate succeeds.

[3] See 4.14.1 for more about the special PERIOD named SYSTEM_TIME.
Whereas application-time PERIODs store a history of a *thing*, the
SYSTEM_TIME PERIOD stores a history of changes to the *database*
itself.

[4] In Part 12 the `<identifier chain>` of SQL syntax is amended to
include `PERIODs` as well as columns.

[5] https://www.postgresql.org/docs/current/static/functions-range.html

[6] 4.6.5.3: "The columns shall both be of a datetime data type and
known not nullable."

[7] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational
Theory, Second Edition: Temporal Databases in the Relational Model and
SQL. 2nd edition, 2014.

[8] 5.38 (7IWD2-11-Schemata-2011-12.pdf).

[9] https://www.postgresql-archive.org/Periods-td6022563.html

[10] 
https://illuminatedcomputing.com/posts/2017/12/temporal-databases-bibliography/

SQL:2011 PERIODS vs Postgres Ranges?

Reply via email to