Re: Implementing Incremental View Maintenance

denty Mon, 31 Dec 2018 02:42:09 -0800

Hi Yugo.

> I would like to implement Incremental View Maintenance (IVM) on
> PostgreSQL.


Great. :-)

I think it would address an important gap in PostgreSQL’s feature set.

> 2. How to compute the delta to be applied to materialized views
> 
> Essentially, IVM is based on relational algebra. Theorically, changes on
> base
> tables are represented as deltas on this, like "R <- R + dR", and the
> delta on
> the materialized view is computed using base table deltas based on "change
> propagation equations".  For implementation, we have to derive the
> equation from
> the view definition query (Query tree, or Plan tree?) and describe this as
> SQL
> query to compulte delta to be applied to the materialized view.

We had a similar discussion in this thread
https://www.postgresql.org/message-id/flat/FC784A9F-F599-4DCC-A45D-DBF6FA582D30%40QQdd.eu,
and I’m very much in agreement that the "change propagation equations”
approach can solve for a very substantial subset of common MV use cases.

> There could be several operations for view definition: selection,
> projection, 
> join,  aggregation, union, difference, intersection, etc.  If we can
> prepare a
> module for each operation, it makes IVM extensable, so we can start a
> simple 
> view definition, and then support more complex views.

Such a decomposition also allows ’stacking’, allowing complex MV definitions
to be attacked even with only a small handful of modules.

I did a bit of an experiment to see if "change propagation equations” could
be computed directly from the MV’s pg_node_tree representation in the
catalog in PlPgSQL. I found that pg_node_trees are not particularly friendly
to manipulation in PlPgSQL. Even with a more friendly-to-PlPgSQL
representation (I played with JSONB), then the next problem is making sense
of the structures, and unfortunately amongst the many plan/path/tree utility
functions in the code base, I figured only a very few could be sensibly
exposed to PlPgSQL. Ultimately, although I’m still attracted to the idea,
and I think it could be made to work, native code is the way to go at least
for now.

> 4. When to maintain materialized views
> 
> [...]
> 
> In the previous discussion[4], it is planned to start from "eager"
> approach. In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command
> to perform IVM.
> I am not sure which is better as a start point, but I begin to think that
> the eager
> approach may be more simple since we don't have to maintain base table
> changes in other
> past transactions.

Certainly the eager approach allows progress to be made with less
infrastructure.

I am concerned that the eager approach only addresses a subset of the MV use
case space, though. For example, if we presume that an MV is present because
the underlying direct query would be non-performant, then we have to at
least question whether applying the delta-update would also be detrimental
to some use cases.

In the eager maintenance approache, we have to consider a race condition
where two
different transactions change base tables simultaneously as discussed in
[4].

I wonder if that nudges towards a logged approach. If the race is due to
fact of JOIN-worthy tuples been made visible after a COMMIT, but not before,
then does it not follow that the eager approach has to fire some kind of
reconciliation work at COMMIT time? That seems to imply a persistent queue
of some kind, since we can’t assume transactions to be so small to be able
to hold the queue in memory.

Hmm. I hadn’t really thought about that particular corner case. I guess a
‘catch' could be simply be to detect such a concurrent update and demote the
refresh approach by marking the MV stale awaiting a full refresh.

denty.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Implementing Incremental View Maintenance

Reply via email to