On Thu, Jun 1, 2023 at 2:47 AM Yugo NAGATA <nag...@sraoss.co.jp> wrote: > > On Thu, 1 Jun 2023 23:59:09 +0900 > Yugo NAGATA <nag...@sraoss.co.jp> wrote: > > > Hello hackers, > > > > Here's a rebased version of the patch-set adding Incremental View > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > [1] > > https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp > > --------------------------------------------------------------------------------------- > * Overview > > Incremental View Maintenance (IVM) is a way to make materialized views > up-to-date by computing only incremental changes and applying them on > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > only small parts of the view are changed. > > ** Feature > > The attached patchset provides a feature that allows materialized views > to be updated automatically and incrementally just after a underlying > table is modified. > > You can create an incementally maintainable materialized view (IMMV) > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > The followings are supported in view definition queries: > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > - some built-in aggregate functions (count, sum, avg, min, max) > - GROUP BY clause > - DISTINCT clause > > Views can contain multiple tuples with the same content (duplicate tuples). > > ** Restriction > > The following are not supported in a view definition: > - Outer joins > - Aggregates otehr than above, window functions, HAVING > - Sub-queries, CTEs > - Set operations (UNION, INTERSECT, EXCEPT) > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > Also, a view definition query cannot contain other views, materialized views, > foreign tables, partitioned tables, partitions, VALUES, non-immutable > functions, > system columns, or expressions that contains aggregates. > > --------------------------------------------------------------------------------------- > * Design > > An IMMV is maintained using statement-level AFTER triggers. > When an IMMV is created, triggers are automatically created on all base > tables contained in the view definition query. > > When a table is modified, changes that occurred in the table are extracted > as transition tables in the AFTER triggers. Then, changes that will occur in > the view are calculated by a rewritten view dequery in which the modified > table > is replaced with the transition table. > > For example, if the view is defined as "SELECT * FROM R, S", and tuples > inserted > into R are stored in a transiton table dR, the tuples that will be inserted > into > the view are calculated as the result of "SELECT * FROM dR, S". > > ** Multiple Tables Modification > > Multiple tables can be modified in a statement when using triggers, foreign > key > constraint, or modifying CTEs. When multiple tables are modified, we need > the state of tables before the modification. > > For example, when some tuples, dR and dS, are inserted into R and S > respectively, > the tuples that will be inserted into the view are calculated by the following > two queries: > > "SELECT * FROM dR, S_pre" > "SELECT * FROM R, dS" > > where S_pre is the table before the modification, R is the current state of > table, that is, after the modification. This pre-update states of table > is calculated by filtering inserted tuples and appending deleted tuples. > The subquery that represents pre-update state is generated in > get_prestate_rte(). > Specifically, the insterted tuples are filtered by calling > IVM_visible_in_prestate() > in WHERE clause. This function checks the visibility of tuples by using > the snapshot taken before table modification. The deleted tuples are contained > in the old transition table, and this table is appended using UNION ALL. > > Transition tables for each modification are collected in each AFTER trigger > function call. Then, the view maintenance is performed in the last call of > the trigger. > > In the original PostgreSQL, tuplestores of transition tables are freed at the > end of each nested query. However, their lifespan needs to be prolonged to > the end of the out-most query in order to maintain the view in the last AFTER > trigger. For this purpose, SetTransitionTablePreserved is added in trigger.c. > > ** Duplicate Tulpes > > When calculating changes that will occur in the view (= delta tables), > multiplicity of tuples are calculated by using count(*). > > When deleting tuples from the view, tuples to be deleted are identified by > joining the delta table with the view, and tuples are deleted as many as > specified multiplicity by numbered using row_number() function. > This is implemented in apply_old_delta(). > > When inserting tuples into the view, each tuple is duplicated to the > specified multiplicity using generate_series() function. This is implemented > in apply_new_delta(). > > ** DISTINCT clause > > When DISTINCT is used, the view has a hidden column __ivm_count__ that > stores multiplicity for tuples. When tuples are deleted from or inserted into > the view, the values of __ivm_count__ column is decreased or increased as many > as specified multiplicity. Eventually, when the values becomes zero, the > corresponding tuple is deleted from the view. This is implemented in > apply_old_delta_with_count() and apply_new_delta_with_count(). > > ** Aggregates > > Built-in count sum, avg, min, and max are supported. Whether a given > aggregate function can be used or not is checked by using its OID in > check_aggregate_supports_ivm(). > > When creating a materialized view containing aggregates, in addition > to __ivm_count__, more than one hidden columns for each aggregate are > added to the target list. For example, columns for storing sum(x), > count(x) are added if we have avg(x). When the view is maintained, > aggregated values are updated using these hidden columns, also hidden > columns are updated at the same time. > > The maintenance of aggregated view is performed in > apply_old_delta_with_count() and apply_new_delta_with_count(). The SET > clauses for updating columns are generated by append_set_clause_*(). > > If the view has min(x) or max(x) and the minimum or maximal value is > deleted from a table, we need to update the value to the new min/max > recalculated from the tables rather than incremental computation. This > is performed in recalc_and_set_values(). > > --------------------------------------------------------------------------------------- > * Details of the patch-set (v28) > > > The patch-set consists of the following eleven patches. > > In the previous version, the number of patches were nine. > In the latest patch-set, the patches are divided more finely > aiming to make the review easier. > > > - 0001: Add a syntax to create Incrementally Maintainable Materialized Views > > The prposed syntax to create an incrementally maintainable materialized > view (IMMV) is; > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > However, this syntax is tentative, so any suggestions are welcomed. > > > - 0002: Add relisivm column to pg_class system catalog > > We add a new field in pg_class to indicate a relation is IMMV. > Another alternative is to add a new catalog for managing materialized > views including IMMV, but I am not sure if we want this. > > > - 0003: Allow to prolong life span of transition tables until transaction > > end > > This patch fixes the trigger system to allow to prolong lifespan of > tuple stores for transition tables until the transaction end. We need > this because multiple transition tables have to be preserved until the > end of the out-most query when multiple tables are modified by nested > triggers. (as explained above in Design - Multiple Tables Modification) > > If we don't want to change the trigger system in such way, the alternative > is to copy the contents of transition tables to other tuplestores, although > it needs more time and memory. > > > - 0004: Add Incremental View Maintenance support to pg_dump > > This patch enables pg_dump to output IMMV using the new syntax. > > > - 0005: Add Incremental View Maintenance support to psql > > This patch implements tab-completion for the new syntax and adds > information of IMMV to \d meta-command results. > > > - 0006: Add Incremental View Maintenance support > > This patch implements the basic IVM feature. > DISTINCT and aggregate are not supported here. > > When an IMMV is created, the view query is checked, and if any > non-supported feature is used, it raises an error. If it is ok, > triggers are created on base tables and an unique index is > created on the view if possible. > > In BEFORE trigger, an entry is created for each IMMV and the number > of trigger firing is counted. Also, the snapshot just before the > table modification is stored. > > In AFTER triggers, each transition tables are preserved. The number > of trigger firing is counted also here, and when the firing number of > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > trigger call. > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > query is executed to generate delta tables, and deltas are applied > to the view. If multiple tables are modified simultaneously, this > process is iterated for each modified table. Tables before processed > are represented in "pre-update-state", processed tables are > "post-update-state" in the rewritten query. > > > - 0007: Add DISTINCT support for IVM > > This patch adds DISTINCT clause support. > > When an IMMV including DISTINCT is created, a hidden column > "__ivm_count__" is added to the target list. This column has the > number of duplicity of the same tuples. The duplicity is calculated > by adding "count(*)" and GROUP BY to the view query. > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated, > and a tuples whose duplicity becomes zero can be deleted from the view. > This logic is implemented by SQL in apply_old_delta_with_count and > apply_new_delta_with_count. > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > appear when a view is accessed by "SELECT * FROM ....". This is > implemented by fixing parse_relation.c. > > > - 0008: Add aggregates support in IVM > > This patch provides codes for aggregates support, specifically > for builtin count, sum, and avg. > > When an IMMV containing an aggregate is created, it is checked if this > aggregate function is supported, and if it is ok, some hidden columns > are added to the target list. > > When the IMMV is maintained, the aggregated value is updated as well as > related hidden columns. The way of update depends the type of aggregate > functions, and SET clause string is generated for each aggregate. > > > - 0009: Add support for min/max aggregates for IVM > > This patch adds min/max aggregates support. > > This is separated from #0008 because min/max needs more complicated > work than count, sum, and avg. > > If the view has min(x) or max(x) and the minimum or maximal value is > deleted from a table, we need to update the value to the new min/max > recalculated from the tables rather than incremental computation. > This is performed in recalc_and_set_values(). > > TIDs and keys of tuples that need re-calculation are returned as a > result of the query that deleted min/max values from the view using > RETURNING clause. The plan to recalculate and set the new min/max value > are stored and reused. > > > - 0010: regression tests > > This patch provides regression tests for IVM. > > > - 0011: documentation > > This patch provides documantation for IVM. > > --------------------------------------------------------------------------------------- > * Changes from the Previous Version (v27) > > - Allow TRUNCATE on base tables > > When a base table is truncated, the view content will be empty if the > view definition query does not contain an aggregate without a GROUP clause. > Therefore, such views can be truncated. > > Aggregate views without a GROUP clause always have one row. Therefore, > if a base table is truncated, the view will not be empty and will contain > a row with NULL value (or 0 for count()). So, in this case, we refresh the > view instead of truncating it. > > - Fix bugs reported by huyajun [1] > > [1] > https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BDDFDDA5D6A19587809%40qq.com > > --------------------------------------------------------------------------------------- > * Discussion > > ** Aggregate support > > There were a few suggestions that general aggregate functions should be > supported [2][3], which may be possible by extending pg_aggregate catalog. > However, we decided to leave supporting general aggregates to the future work > [4] > because it would need substantial works and make the patch more complex and > bigger. > > There has been no opposite opinion on this. However, if we need more > discussion > on the design of aggregate support, we can omit aggregate support for the > first > release of IVM. > > [2] > https://www.postgresql.org/message-id/20191128140333.GA25947%40alvherre.pgsql > [3] > https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiKGVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > [4] > https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > ** Hidden columns > > In order to support DISTINCT or aggregates, our implementation uses hidden > columns. > > Columns starting with "__ivm_" are hidden columns that doesn't appear when a > view is accessed by "SELECT * FROM ....". For this aim, parse_relation.c is > fixed. There was a proposal to enable hidden columns by adding a new flag to > pg_attribute [5], but this thread is no longer active, so we decided to check > the hidden column by its name [6]. > > [5] > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > [6] > https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1eca7babe093e%40sraoss.co.jp > > ** Concurrent Transactions > > When the view definition has more than one table, we acquire an exclusive > lock before the view maintenance in order to avoid inconsistent results. > This behavior was explained in [7]. The lock was improved to use weaker lock > when the view has only one table based on a suggestion from Konstantin > Knizhnik [8]. > However, due to the implementation that uses ctid for identifying target > tuples, > we still have to use an exclusive lock for DELETE and UPDATE. > > [7] > https://www.postgresql.org/message-id/20200909092752.c91758a1bec3479668e82643%40sraoss.co.jp > [8] > https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d279567e2a%40postgrespro.ru > > ** Automatic Index Creation > > When a view is created, a unique index is automatically created if > possible, that is, if the view definition query has a GROUP BY or > DISTINCT, or if the view contains all primary key attributes of > its base tables in the target list. It is necessary for efficient > view maintenance. This feature is based on a suggestion from > Konstantin Knizhnik [9]. > > [9] > https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e415df6da14d%40postgrespro.ru > > > ** Trigger and Transition Tables > > We implemented IVM based on triggers. This is because we want to use > transition tables to extract changes on base tables. Also, there are > other constraint that are using triggers in its implementation, like > foreign references. However, if we can use transition table like feature > without relying triggers, we don't have to insist to use triggers and we > might implement IVM in the executor directly as similar as declarative > partitioning. > > ** Feature to be Supported in the First Release > > The current patch-set supports DISTINCT and aggregates for built-in count, > sum, avg, min and max. Do we need all these feature for the first IVM release? > Supporting DISTINCT and aggregates needs discussion on hidden columns, and > for supporting min/max we need to discuss on re-calculation method. Before > handling such relatively advanced feature, maybe, should we focus to design > and implement of the basic feature of IVM? > > > Any suggestion and discussion are welcomed! > > Regards, > Yugo Nagata > > -- > Yugo NAGATA <nag...@sraoss.co.jp> > >
> The followings are supported in view definition queries: > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > Also, a view definition query cannot contain other views, materialized views, > foreign tables, partitioned tables, partitions, VALUES, non-immutable > functions, > system columns, or expressions that contains aggregates. Does this also apply to tableoid? but tableoid is a constant, so it should be fine? can following two queries apply to this feature. select tableoid, unique1 from tenk1; select 1 as constant, unique1 from tenk1; I didn't apply the patch.(will do later, for someone to test, it would be a better idea to dump a whole file separately....).