Hi Yugo. > I would like to implement Incremental View Maintenance (IVM) on > PostgreSQL.
Great. :-) I think it would address an important gap in PostgreSQL’s feature set. > 2. How to compute the delta to be applied to materialized views > > Essentially, IVM is based on relational algebra. Theorically, changes on > base > tables are represented as deltas on this, like "R <- R + dR", and the > delta on > the materialized view is computed using base table deltas based on "change > propagation equations". For implementation, we have to derive the > equation from > the view definition query (Query tree, or Plan tree?) and describe this as > SQL > query to compulte delta to be applied to the materialized view. We had a similar discussion in this thread https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu, and I’m very much in agreement that the "change propagation equations” approach can solve for a very substantial subset of common MV use cases. > There could be several operations for view definition: selection, > projection, > join, aggregation, union, difference, intersection, etc. If we can > prepare a > module for each operation, it makes IVM extensable, so we can start a > simple > view definition, and then support more complex views. Such a decomposition also allows ’stacking’, allowing complex MV definitions to be attacked even with only a small handful of modules. I did a bit of an experiment to see if "change propagation equations” could be computed directly from the MV’s pg_node_tree representation in the catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL representation (I played with JSONB), then the next problem is making sense of the structures, and unfortunately amongst the many plan/path/tree utility functions in the code base, I figured only a very few could be sensibly exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea, and I think it could be made to work, native code is the way to go at least for now. > 4. When to maintain materialized views > > [...] > > In the previous discussion[4], it is planned to start from "eager" > approach. In our PoC > implementaion, we used the other aproach, that is, using REFRESH command > to perform IVM. > I am not sure which is better as a start point, but I begin to think that > the eager > approach may be more simple since we don't have to maintain base table > changes in other > past transactions. Certainly the eager approach allows progress to be made with less infrastructure. I am concerned that the eager approach only addresses a subset of the MV use case space, though. For example, if we presume that an MV is present because the underlying direct query would be non-performant, then we have to at least question whether applying the delta-update would also be detrimental to some use cases. In the eager maintenance approache, we have to consider a race condition where two different transactions change base tables simultaneously as discussed in [4]. I wonder if that nudges towards a logged approach. If the race is due to fact of JOIN-worthy tuples been made visible after a COMMIT, but not before, then does it not follow that the eager approach has to fire some kind of reconciliation work at COMMIT time? That seems to imply a persistent queue of some kind, since we can’t assume transactions to be so small to be able to hold the queue in memory. Hmm. I hadn’t really thought about that particular corner case. I guess a ‘catch' could be simply be to detect such a concurrent update and demote the refresh approach by marking the MV stale awaiting a full refresh. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html