[HACKERS] counting algorithm for incremental matview maintenance

Kevin Grittner Tue, 14 May 2013 12:52:33 -0700

In surveying the literature on $subject, I find that most of the
theoretical work related to how to incrementally update
materialized views based on the matview declaration was published
between 1988 and 1993.  The best paper I have been able to find on
the topic was published in ACM SIGMOD in 1993[1], and covers two
algorithms: counting and DRed.  The former should be very fast for
non-recursive views, but not very efficient for recursive views.
The latter algorithm is the other way around -- it looks like it
will do well with recursive views but generally be slower for
non-recursive views.


It does not seem feasible to me to implement both techniques in a
single one-year PostgreSQL release.  In fact, if we have trouble
getting everyone onto the same page early, we might have to settle
for trying to just get some infrastructure into place, without
anything to actually make use of it.  That would be unfortunate,
since Oracle added incremental maintenance of matviews to their
existing feature in 1999, and have been improving it regularly
since then.  Many other products also have mature implementations
of this, and there seems to be a lot of demand for it in
PostgreSQL.  In the best of circumstances it will take years for us
to catch up on this front.

What I propose for 9.4 is this:

During the first CF I would like to have a discussion to settle on
how best to do some things that the counting algorithm requires, so
that following the first CF I can try to implement that
infrastructure and get at least simple matviews working for
incremental maintenance, and have something ready for the second
CF.  I can hack up a rough patch to suggest a possible approach for
purposes of discussion, but I'm not sure whether either that's
necessary or desirable.

----------
Needed infrastructure changes follow.
----------

First, the algorithm requires a new system column called
count_t (or similar).  It would be optional (like oid is) would
only be used by tuples in matviews maintained by the counting
algorithm, and by delta relations used in the matview incremental
maintenance process. Tables and matviews not created or altered to
specify incremental maintenance would not carry this new system
column.  Existing matviews from 9.3 would continue to work as
before until and unless they were ALTERed to specify incremental
maintenance; this form of ALTER would require rewrite of the
matview to add the count_t column.

This algorithm also requires that we have new execution node types
(or conditional behavior in existing nodes) to support processing
counts.  It is worth noting that relations without a count_t column
will sometimes need to be processed using the new logic (in which
case each tuple is treated as having an implied count of one).

count_t in a matview must always be greater than zero.  In delta
relations it can be negative (to reflect tuples being deleted or
the "before" image of an update) or positive (to reflect tuples
being inserted or the "after" image of an update).

Examples of the new behavior in nodes used by incremental
maintenance are:

 * DISTINCT or GROUP BY: count_t in the result should represent the
number of tuples being combined.

 * JOIN: count_t in the result is the product of multiplying the
counts from the pair of joined tuples.

 * UNION (without ALL): count_t in the result should be the value
from an unmatched tuple or the sum of the counts of matched tuples,
with any result row with a zero count_t being omitted from the
result.

Note that for nodes other than UNION, the logic is identical except
for calculation of the count for the output.

The resulting matviews can be used in the traditional set logic of
PostgreSQL, without knowing about or caring about the count_t
column, exactly as before; the new system column is only used for
maintenance.

We could drive the triggering of incremental maintenance off of the
dependency information which is already stored, but for performance
we probably want to add a new pg_class flag to indicate that the
relation is referenced by a matview definition which specifies
incremental update.  That would allow a fast path for skipping
other tests for DML on non-referenced relations, at the expense of
some additional catalog updates on some DDL.

----------
The above is everything that is a significant infrastructure change
needed for a first cut at incremental maintenance of materialized
views, driven by the view definition.  Below is additional detail
about what will happen when incremental maintenance for a view is
specified, for those who wonder why these infrastructure changes
are needed.
----------

Essentially, the technique is based on relational algebra.  This
allows previously existing theory to prove the correctness of the
technique and its optimizations.  The addition of the count_t
column allows accurate identification of exactly which rows need to
be touched during incremental maintenance, with minimal overhead.

As the paper notes, these incremental maintenance techniques are a
heuristic; there will always be cases where it is faster to
regenerate the data from scratch than to use these incremental
techniques, so part of the goal of having the transactional REFRESH
(as described in a separate thread) is to allow some front-end
tests to attempt to recognize when an incremental approach will be
slower than a REFRESH and fall back onto the transactional REFRESH
technique seamlessly and fairly quietly.  (My gut feeling on the
right level for an ereport of this is DEBUG1, although I can see
the argument for a LOG level entry.)

The original and modified versions of the relations (tables or
other matviews) which define a matview must be available to
calculate the matview deltas, so the snapshot information for this
must be available and registered until the matview delta has been
calculated.  They can be released once the delta has been
established and before it has been applied to the matview.  Initial
milestones in this development will focus on "eager" maintenance,
so this will not initially be a big issue, but will become more
important as we work on making more of the work asynchronous, so
that the foreground query is not held up maintaining data which is
allowed to be a little stale. This seems not entirely unrelated to
background worker processes and snapshot sharing.

I don't know whether we can get to this during 9.4 development, but
it might be worth at least considering as the other work is done:
some aggregate columns with a small amount of data kept while the
aggregate is being computed can be further optimized under this
technique by storing the working data in the tuple as some sort of
hidden or system column related to the aggregate.  A few of the
obvious candidates for such optimization include count(), sum() and
avg().  Note that with this capability, especially if we ever
implement the query rewrite capability for matviews, we would have
a nice answer for the count(*) speed complaints.

Obviously, there will need to be some new syntax around CMV and AMV
to support specification of how the matview should be maintained.

Also, we are moving into a phase where it makes sense to start the
bikeshedding on what timings we should support, how information
about the desired timings should be stored, how "freshness" should
be tracked, and what syntax we put around all that.  Frankly, I
think I will have my hands pretty full with the incremental
maintenance work, and I see the timing work as largely orthogonal,
so it would be great if anyone who was interested in that aspect of
things could take primary responsibility for moving that along.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://dl.acm.org/citation.cfm?id=170066

[HACKERS] counting algorithm for incremental matview maintenance

Reply via email to