Hi, I've updated the wiki page of Incremental View Maintenance.
https://wiki.postgresql.org/wiki/Incremental_View_Maintenance On Thu, 11 Jul 2019 13:28:04 +0900 Yugo Nagata <nag...@sraoss.co.jp> wrote: > Hi Thomas, > > Thank you for your review and discussion on this patch! > > > > 2019年7月8日(月) 15:32 Thomas Munro <thomas.mu...@gmail.com>: > > > > > > > On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata <nag...@sraoss.co.jp> > > > > wrote: > > > > > Attached is a WIP patch of IVM which supports some aggregate > > > > > functions. > > > > > > > > Hi Nagata-san and Hoshiai-san, > > > > > > > > Thank you for working on this. I enjoyed your talk at PGCon. I've > > > > added Kevin Grittner just in case he missed this thread; he has talked > > > > often about implementing the counting algorithm, and he wrote the > > > > "trigger transition tables" feature to support exactly this. While > > > > integrating trigger transition tables with the new partition features, > > > > we had to make a number of decisions about how that should work, and > > > > we tried to come up with answers that would work for IMV, and I hope > > > > we made the right choices! > > Transition tables is a great feature. I am now using this in my implementation > of IVM, but the first time I used this feature was when I implemented a PoC > for extending view updatability of PostgreSQL[1]. At that time, I didn't know > that this feature is made originally aiming to support IVM. > > [1] https://www.pgcon.org/2017/schedule/events/1074.en.html > > I think transition tables is a good choice to implement a statement level > immediate view maintenance where materialized views are refreshed in a > statement > level after trigger. However, when implementing a transaction level immediate > view maintenance where views are refreshed per transaction, or deferred view > maintenance, we can't update views in a after trigger, and we will need an > infrastructure to manage change logs of base tables. Transition tables can be > used to collect these logs, but using logical decoding of WAL is another > candidate. > In any way, if these logs can be collected in a tuplestore, we might able to > convert this to "ephemeral named relation (ENR)" and use this to calculate > diff > sets for views. > > > > > > > > > I am quite interested to learn how IVM interacts with SERIALIZABLE. > > > > > > > > > > Nagata-san has been studying this. Nagata-san, any comment? > > In SERIALIZABLE or REPEATABLE READ level, table changes occurred in other > ransactions are not visible, so views can not be maintained correctly in AFTER > triggers. If a view is defined on two tables and each table is modified in > different concurrent transactions respectively, the result of view maintenance > done in trigger functions can be incorrect due to the race condition. This is > the > reason why such transactions are aborted immediately in that case in my > current > implementation. > > One idea to resolve this is performing view maintenance in two phases. > Firstly, > views are updated using only table changes visible in this transaction. Then, > just after this transaction is committed, views have to be updated > additionally > using changes happened in other transactions to keep consistency. This is a > just > idea, but to implement this idea, I think, we will need keep to keep and > maintain change logs. > > > > > A couple of superficial review comments: > > > > > > > + const char *aggname = get_func_name(aggref->aggfnoid); > > > > ... > > > > + else if (!strcmp(aggname, "sum")) > > > > > > > > I guess you need a more robust way to detect the supported aggregates > > > > than their name, or I guess some way for aggregates themselves to > > > > specify that they support this and somehow supply the extra logic. > > > > Perhaps I just waid what Greg Stark already said, except not as well. > > Yes. Using name is not robust because users can make same name aggregates > like > sum(text) (although I am not sure this makes sense). We can use oids instead > of names, but it would be nice to extend pg_aggregate and add new attributes > for informing that this supports IVM and for providing functions for IVM > logic. > > > > > As for the question of how > > > > to reserve a namespace for system columns that won't clash with user > > > > columns, according to our manual the SQL standard doesn't allow $ in > > > > identifier names, and according to my copy SQL92 "intermediate SQL" > > > > doesn't allow identifiers that end in an underscore. I don't know > > > > what the best answer is but we should probably decide on a something > > > > based the standard. > > Ok, so we should use "__ivm_count__" since this ends in "_" at least. > > Another idea is that users specify the name of this special column when > defining materialized views with IVM support. This way can avoid the conflict > because users will specify a name which does not appear in the target list. > > As for aggregates supports, it may be also possible to make it a restriction > that count(expr) must be in the target list explicitly when sum(expr) or > avg(expr) is included, instead of making hidden column like __ivm_count_sum__, > like Oracle does. > > > > > > > > > As for how to make internal columns invisible to SELECT *, previously > > > > there have been discussions about doing that using a new flag in > > > > pg_attribute: > > > > > > > > > > > > https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > I agree implementing this feature in PostgreSQL since there are at least a few > use cases, IVM and temporal database. > > > > > > > > > + "WITH t AS (" > > > > + " SELECT diff.__ivm_count__, > > > > (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" > > > > + ", %s" > > > > + " FROM %s AS mv, %s AS diff WHERE (%s) = > > > > (%s)" > > > > + "), updt AS (" > > > > + " UPDATE %s AS mv SET __ivm_count__ = > > > > mv.__ivm_count__ - t.__ivm_count__" > > > > + ", %s " > > > > + " FROM t WHERE mv.ctid = t.ctid AND NOT > > > > for_dlt" > > > > + ") DELETE FROM %s AS mv USING t WHERE > > > > mv.ctid = t.ctid AND for_dlt;", > > > > > > > > I fully understand that this is POC code, but I am curious about one > > > > thing. These queries that are executed by apply_delta() would need to > > > > be converted to C, or at least used reusable plans, right? Hmm, > > > > creating and dropping temporary tables every time is a clue that the > > > > ultimate form of this should be tuplestores and C code, I think, > > > > right? > > I used SPI just because REFRESH CONCURRENTLY uses this, but, as you said, > it is inefficient to create/drop temp tables and perform parse/plan every > times. > It seems to be enough to perform this once when creating materialized views > or > at the first maintenance time. > > > Best regards, > Yugo Nagata > > > -- > Yugo Nagata <nag...@sraoss.co.jp> > > -- Yugo Nagata <nag...@sraoss.co.jp>