Re: Delta Materialized View Refreshes?
Hi folks, I had a crack at this, and it was pretty simple to get something working to play around with, and it seems like it might be useful. I developed it against 10.1, as that's what I happened to be working with at the time. The patch is pretty small, and I hoped it would apply cleanly against 11. Unfortunately it doesn't, but I doubt the issues are substantial. If there is interest in moving this forward, I'll update and re-share. The patch enables pretty much exactly what Jeremy suggests — something like "refresh materialized view concurrently testview where type = 'main';" — with fairly obvious semantics. Welcome comments on the patch or approach. denty. (Seems I can't attach via the web interface, so copy/paste patch below.) -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Delta Materialized View Refreshes?
Hi folks, I've shared a new patch against 11.0, which seems to work as expected. (Message ID 5100c2b3-641b-4a35-86d0-12ed2e618...@qqdd.eu.) While playing with it, it is actually quite easy to get it confused. And so I wonder — is it actually what we want? For example, if I refresh including a WHERE that filters /out/ some content presently in the MV, but filters /in/ some new content relating to those same rows, then we predictably get a fail. Using the following example MV MV, 'testview', AS SELECT test.type, test.message, count(1) AS count FROM test GROUP BY test.type, test.message, then a refresh materialized view concurrently testview where type = 'main' and count>2 hits: ERROR: duplicate key value violates unique constraint "testview_type_message_idx" DETAIL: Key (type, message)=(main, hello world) already exists. CONTEXT: SQL statement "INSERT INTO public.testview SELECT (diff.newdata).* FROM pg_temp_3.pg_temp_16390_2 diff WHERE tid IS NULL" The message can probably be cleaned up. But the root cause is clearly in the fact that REFRESH ... WHERE really needs to be used quite carefully. I mused about restricting the WHERE clause Vars to allow reference only to columns that are part of the MV's UNIQUE index. It seems it would prevent the issue arising in my simple example, but is it always necessary? And would it be overly restrictive? (For example: would it prevent people issuing delta refreshes and including clauses that make the refresh performant — because perhaps it helps the planner see a short cut to speedy execution?) On a different topic, in implementing it, I noticed that there is rudimentary code-level support for incremental refreshes (see Open/CloseMatViewIncrementalMaintenance() and MatViewIncrementalMaintenanceIsEnabled()), but the facility is not hook-able. There's another discussion (Flexible permissions for REFRESH MATERIALIZED VIEW), and I wonder if a more interesting feature would be to either allow the incremental refresh barriers to be hooked by extensions, or even to offer a fine-grained permission that allows direct manipulation of data in the MV's underlying table. Far as I can see, allowing extensions to hook the incremental refresh APIs would be trivial. Exposing the same via a fine-grained permission would certainly be much harder but it might enable advanced delta-refresh strategies to emerge that are written in high level languages such as PL/pgSQL or Java (etc.) — that is certainly desirable. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Implementing Incremental View Maintenance
Hi Yugo. > I would like to implement Incremental View Maintenance (IVM) on > PostgreSQL. Great. :-) I think it would address an important gap in PostgreSQL’s feature set. > 2. How to compute the delta to be applied to materialized views > > Essentially, IVM is based on relational algebra. Theorically, changes on > base > tables are represented as deltas on this, like "R <- R + dR", and the > delta on > the materialized view is computed using base table deltas based on "change > propagation equations". For implementation, we have to derive the > equation from > the view definition query (Query tree, or Plan tree?) and describe this as > SQL > query to compulte delta to be applied to the materialized view. We had a similar discussion in this thread https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu, and I’m very much in agreement that the "change propagation equations” approach can solve for a very substantial subset of common MV use cases. > There could be several operations for view definition: selection, > projection, > join, aggregation, union, difference, intersection, etc. If we can > prepare a > module for each operation, it makes IVM extensable, so we can start a > simple > view definition, and then support more complex views. Such a decomposition also allows ’stacking’, allowing complex MV definitions to be attacked even with only a small handful of modules. I did a bit of an experiment to see if "change propagation equations” could be computed directly from the MV’s pg_node_tree representation in the catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL representation (I played with JSONB), then the next problem is making sense of the structures, and unfortunately amongst the many plan/path/tree utility functions in the code base, I figured only a very few could be sensibly exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea, and I think it could be made to work, native code is the way to go at least for now. > 4. When to maintain materialized views > > [...] > > In the previous discussion[4], it is planned to start from "eager" > approach. In our PoC > implementaion, we used the other aproach, that is, using REFRESH command > to perform IVM. > I am not sure which is better as a start point, but I begin to think that > the eager > approach may be more simple since we don't have to maintain base table > changes in other > past transactions. Certainly the eager approach allows progress to be made with less infrastructure. I am concerned that the eager approach only addresses a subset of the MV use case space, though. For example, if we presume that an MV is present because the underlying direct query would be non-performant, then we have to at least question whether applying the delta-update would also be detrimental to some use cases. In the eager maintenance approache, we have to consider a race condition where two different transactions change base tables simultaneously as discussed in [4]. I wonder if that nudges towards a logged approach. If the race is due to fact of JOIN-worthy tuples been made visible after a COMMIT, but not before, then does it not follow that the eager approach has to fire some kind of reconciliation work at COMMIT time? That seems to imply a persistent queue of some kind, since we can’t assume transactions to be so small to be able to hold the queue in memory. Hmm. I hadn’t really thought about that particular corner case. I guess a ‘catch' could be simply be to detect such a concurrent update and demote the refresh approach by marking the MV stale awaiting a full refresh. denty. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Delta Materialized View Refreshes?
The idea of allowing a WHERE clause to be appended to REFRESH MATERIALIZED VIEW seems useful. It would enable those that know well the pattern of data modification in their underlying use case to schedule delta-updates (say, from crontab). And also it would be a useful as a foundation for more ambitious ideas. For example, some folk can/wish to do something entirely custom that is trigger-based. Others such as Nico's have toolkits that intend to cover quite a lot of the automagical refresh cases (see this other thread https://www.postgresql-archive.org/Query-Rewrite-for-Materialized-Views-FDW-Extension-td6015235.html), and it allows the data to remain in an MVIEW to boot, rather than forcing them to use normal tables instead. It could also be the foundation for something much more integrated such as Corey's CONTINUOUS MATERIALIZED VIEW concept. So I'm certainly supportive of the idea. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html