Hi! On Thu, Dec 27, 2018 at 4:57 AM Yugo Nagata <nag...@sraoss.co.jp> wrote: > I would like to implement Incremental View Maintenance (IVM) on PostgreSQL. > IVM is a technique to maintain materialized views which computes and applies > only the incremental changes to the materialized views rather than > recomputate the contents as the current REFRESH command does.
That sounds great! I am interested in this topic because I am interested in reactive/live queries and support for them in PostgreSQL. [1] In that context, the problem is very similar: based on some state of query results and updated source tables, determine what should be new updates to send to the client describing changes to the query results. So after computing those incremental changes, instead of applying them to materialized view I would send them to the client. One could see materialized views only type of consumers of such information about incremental change. So I would like to ask if whatever is done in this setting is done in a way that one could also outside of the context of materialized view. Not sure what would API be thought. >From the perspective of reactive/live queries, this package [2] is interesting. To my understanding, it adds to all base tables two columns, one for unique ID and one for revision of the row. And then rewrites queries so that this information is passed all the way to query results. In this way it can then determine mapping between inputs and outputs. I am not sure if it then does incremental update or just uses that to determine if view is invalidated. Not sure if there is anything about such approach in literature. Or why both index and revision columns are needed. > For these reasons, we started to think to implement IVM without relying on > OIDs > and made a bit more surveys. I also do not see much difference between asking users to have primary key on base tables or asking them to have OIDs. Why do you think that a requirement for primary keys is a hard one? I think we should first focus on having IVM with base tables with primary keys. Maybe then later on we could improve on that and make it also work without. To me personally, having unique index on source tables and also on materialized view is a reasonable restriction for this feature. Especially for initial versions of it. > However, the discussion about IVM is now stoped, so we would like to restart > and > progress this. What would be next steps in your view to move this further? > If we can represent a change of UPDATE on a base table as query-like rather > than > OLD and NEW, it may be possible to update the materialized view directly > instead > of performing delete & insert. Why do you need OLD and NEW? Don't you need just NEW and a list of columns which changed from those in NEW? I use such diffing query [4] to represent changes: first column has a flag telling if the row is representing insert, update, and remove, the second column tells which column are being changed in the case of the update, and then the NEW columns follow. I think that maybe standardizing structure for representing those changes would be a good step towards making this modular and reusable. Because then we can have three parts: * Recording and storing changes in a standard format. * A function which given original data, stored changes, computes updates needed, also in some standard format. * A function which given original data and updates needed, applies them. > In the previous discussion[4], it is planned to start from "eager" approach. > In our PoC > implementaion, we used the other aproach, that is, using REFRESH command to > perform IVM. > I am not sure which is better as a start point, but I begin to think that the > eager > approach may be more simple since we don't have to maintain base table > changes in other > past transactions. I think if we split things into three parts as I described above, then this is just a question of configuration. Or you call all three inside one trigger to update in "eager" fashion. Or you store computed updates somewhere and then on demand apply those in "lazy" fashion. > In the eager maintenance approache, we have to consider a race condition > where two > different transactions change base tables simultaneously as discussed in [4]. But in the case of "lazy" maintenance there is a mirror problem: what if later changes to base tables invalidate some previous change to the materialized view. Imagine that one cell in a base table is first updated too "foo" and we compute an update for the materialized view to set it to "foo". And then the same cell is updated to "bar" and we compute an update for the materialized view again. If we have not applied any of those updates (because we are "lazy") now the previously computed update can be discarded. We could still apply both, but it would not be efficient. [1] https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com [2] https://github.com/nothingisdead/pg-live-query [3] https://www.postgresql.org/docs/devel/sql-createtable.html [4] https://github.com/tozd/node-reactive-postgres/blob/eeda4f28d096b6e552d04c5ea138c258cb5b9389/index.js#L329-L340 Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m