Josh Berkus <j...@agliodbs.com> wrote: > It's fairly common for matviews to be constructed such that > updates to them are strictly appends. For example, a matview > which has a daily summary would just get appended to each day, > and existing rows would not change barring a major historical > database cleanup. > > It seems like we could ... and ought to ... optimize for this > pattern somehow for incremental updates. That is, if the user > knows that we're going to be only appending new rows and not > modifying any old ones, s/he ought to be able to tell the > database that somehow and avoid the overhead of checking. While > the overhead of checking a count wouldn't be that high for a few > hundred rows, I've dealt with matviews which were thousands to > millions of rows.
Thanks for the suggestion; I will keep an eye out for ways this insight might allow an optimization. I think there might be some misunderstanding of the counting algorithm, though -- there is no need to do a sequential pass through the matview examining the counts. I don't want to replicate the content of a fairly dense (in the sense of having a lot of content per page) 10 page computer science paper in this email, but for purposes of illustration I will take a very simple case and show how it works. (This is not geared toward your particular case, because that could get kinda long to explain here and now, but hopefully this will give an insight into the technique overall.) Let's say there is a table and matview like this: create table foo (fooid int primary key, val int not null); create materialized view bar as select distinct val from foo; Let's say there are millions of rows in both, and that we have flagged the view for incremental maintenance. (Syntax omitted to avoid distracting bikeshedding on that when the point is the algorithm.) Now, someone runs this: update foo set val = val + 1 where fooid between 1 and 10; What will happen is this: Since foo will be flagged as a relation which is referenced by an incrementally maintained matview, a delta relation will be materialized for this update, which will contain the net change to the underlying table in the count_t system column. "Before" tuples will have a count of -1; "after" tuples will have a count of 1. Then the query defining the view will be run *against the delta*, resulting in a relation with a count_t column reflecting the net change for each val. Anything with a zero for the net change will be dropped. We will run a logical UNION of this relation and the bar matview. In this case, we obviously want this to be done in a way that for each row in this "net change" relation, we do an index scan against the bar matview; if not found, we insert the new row into the matview with its count from the net change relation. (That had better be positive or we have a bug -- so elog ERROR if negative.) If bar does contain a matching row, update count_t in that row with the sum of its old value and the value from the "net change" relation. Of course, that new value also had better be positive or zero -- if zero we delete the old row rather than updating count_t. The count_t column saves us from having to scan foo for all the old val values. It does not require any scan of the entire bar matview. It allows us to zero in on exactly the right rows, and lets us know what needs doing. Clearly we want the planner to find the best plans for the interim steps rather than hard-coding particular rule-based plans; I just used an example where it's pretty clear what a reasonable plan would be. Hopefully this makes it fairly clear that the only thing that an optimization around the "append-only" assertion for a matview would be the ability to skip the probe for an existing record *related to rows which are in the delta*. As long as there is reasonable indexing on the matview, maintenance for the append-only case would not involve scanning the entire matview. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company