-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 26 November 2003 10:58 am, Jonathan Gardner wrote:
> On Wednesday 26 November 2003 09:19, Hannu Krosing wrote:
> > What is needed is good algorithms. Writing C code is secondary to that.
> >
> > Similar problem has kept us from implementing updatable views for quite
> > some time.
>
> You are definitely correct.

Preliminary research has shown that:
1) We would put triggers on each table that contributes to a view.

2) We would only be interested in those inserts and deletes (counting an 
update as a delete and then an insert) that satisfy the "WHERE" clause of 
the view.

3) We would implement some sort of differential view update scheme based on 
the paper "Efficiently Updating Materialized Views"[1]. They classify all 
select queries into one of four categories and describe how to go about 
producing a differential update. I couldn't understand this part at all.

- ---

I am already too deep in database theory. While I enjoy reading a good 
paper, I just don't have the background in DB science like I do in Physics. 
For instance, my set theory is very weak (I can't tell if U is a union or 
an intersection, for instanct).

At this point, I have a choice. Drop a couple hundred dollars on database 
theory textbooks and spend the next three months learning it, or hoping one 
of you database theorists out there take pity on me and pick this up or 
coach me through it. Any takers on the second one?

- ---

I have a proposal. Let me hear what you think. There will be two kinds of 
materialized view in PostgreSQL:

The first is the "snapshot" approach. Provide a materialized view, but don't 
provide automatic updating. The user can call a refresh statement to 
repopulate the entire view from time to time as appropriate. The benefit of 
this is that you can use queries that have functions that are immutable or 
stable. (You could probably use volatile as well but I wouldn't recommend 
it). This is so easy to implement, it isn't even funny.

Optimizing this would involve collecting all the inserted / updated /deleted 
rows since the last snapshot. We can have a logging table that accumulates 
all the changes since the last refresh for this purpose. (This kind of 
table may exist anyway for replication or other purposes.)

Finally, we could examine the changed rows, ignore the irrelevant ones, 
ignore the redundant ones (IE, rows that have been inserted and then 
deleted during that time), and decide whether doing a complete refresh 
would be quicker than doing several differential updates to the 
materialized view.


The second is the automatically updated materialized view. Each insert, 
update, or delete executed against the table that are queried for the view 
will trigger a function call using the algorithms proposed in [1]. Every 
function in the query must be stable.


We could provide a mechanism to alter the materialized view between snapshot 
to auto-updated. It would be as simple as refreshing the snapshot and then 
enabling the triggers, or as simple as disabling the triggers.


I already have written a set of scripts in perl that provides the first 
proposal without the optimization idea. I will put them up at GBorg.


Footnotes:
[1] ftp://ftp.research.microsoft.com/users/palarson/sigmod86.ps

- -- 
Jonathan Gardner
[EMAIL PROTECTED]
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/yQjZWgwF3QvpWNwRAnqkAKCasoODeV2+KcP52DMXiEyq7pfhmACcCFBV
X28Nd5PvlhR8Xta/G4w2qBQ=
=OLa8
-----END PGP SIGNATURE-----


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to