Hi,

I am using generate_series + a join to group time-series data into
buckets, which works well as long as I do this only for one
aggregation hierarchy: The index on the timestamp of the table with
the actual time-series data is used for a nested loop index join.

However with more aggregation levels (one aggregation stage consuming
the output of the previous one) chained together using CTEs, there is
no index available and postgresql falls back to (no-index) nested loop
joins.

Example, with the actual access to the data-table replaced, as there
is no index it triggers the nested loop join immediatly (despite
MATERIALIZED + ORDER BY):

WITH series1h AS MATERIALIZED (SELECT generate_series AS ts FROM
generate_series('1990-01-01 00:00', '1990-12-31 23:59', INTERVAL '1
hour') ORDER BY ts),
series15m AS MATERIALIZED (SELECT generate_series AS ts FROM
generate_series('1990-01-01 00:00', '1990-12-31 23:59', INTERVAL '15
minutes') ORDER BY ts)
SELECT count(*) FROM (SELECT h1.ts, count(*) FROM series1h h1 JOIN
series15m m15 ON (m15.ts > (h1.ts - INTERVAL '1 hour') AND  m15.ts <=
h1.ts ) GROUP BY h1.ts ORDER BY h1.ts);

date_bin would allow to join on equality, however according to the
docs  it doesn't support months/years: The stride interval must be
greater than zero and cannot contain units of month or larger.

For now I am using temporary tables which can be indexed, are there
ways to avoid them?

Thanks, Clemens


Reply via email to