Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

Pavel Tue, 20 Apr 2010 12:35:35 -0700

Greg Smith wrote:

pavelbaros wrote:
I am also waiting for approval for my repository named"materialized_view" on git.postgresql.org, so I could publishcompleted parts.
Presuming that you're going to wander there and get assigned whatlooks like an official repo name for this project is abit...optimistic. I would recommend that you publish to somethinglike github instead (you can fork http://github.com/postgres/postgres), and if the work looks good enough that it gets picked up by thecommunity maybe you migrate it onto the main site eventually.git.postgresql.org is really not setup to be general hosting space foreveryone who has a PostgreSQL related project; almost every repo onthere belongs to someone who has already been a steady projectcontributor for a number of years.

Yes, you're true, I'm kind of newbe in this kind of project andspecially in PostgreSQL. But I think it is best way to get intoPostgreSQL. When I chose my bachelor thesis I did not know I couldparticipate GSoC or try to make it commitable. Anyway I will make repoon github, so everybody could look at it, as soon as posible.


<http://github.com/pbaros/postgres>

(Switching to boilerplate mode for a paragraph...) You have picked aPostgreSQL feature that is dramatically more difficult than it appearsto be, and I wouldn't expect you'll actually finish even a fraction ofyour goals in a summer of work. You're at least in plentifulcompany--most students do the same. As a rule, if you see a featureon our TODO list that looks really useful and fun to work on, it'sonly still there because people have tried multiple times to build itcompletely but not managed to do so because it's harder than itappears. This is certainly the case with materialized views.
You've outlined a reasonable way to build a prototype that does alimited implementation here. The issue is what it will take to extendthat into being production quality for the real-world uses ofmaterialized views. How useful your prototype is depends on how wellit implements a subset of that in a way that will get used by thefinal design.
The main hidden complexity in this particular project relates tohandling view refreshes. The non-obvious problem is that when theview updates, you need something like a SQL MERGE to really handlethat in a robust way that doesn't conflict with concurrent access toqueries against the materialized view. And work on MERGE support isitself blocked behind the fact that PostgreSQL doesn't have a good wayto lock access to a key value that doesn't exist yet--what otherdatabases call key range locking. See the notes for "Add SQL-standardMERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todofor more information.
You can work around that to build a prototype by grabbing a full tablelock on the materialized view when updating it, but that's not aproduction quality solution. Solving that little detail is actuallymore work than the entire project you've outlined. Your suggestedimplementation--"In function CloseIntoRel executor swap relfilenode'sof temp table and original table and finally delete temp table"--iswhere the full table lock is going to end up at. The exact use casesthat need materialized views cannot handle a CLUSTER-style tablerecreation each time that needs an exclusive lock to switchover, sothat whole part of your design is going to be a prototype that doesn'twork at all like what needs to get built to make this featurecommittable. It's also not a reasonable assumption that you haveenough disk space to hold a second copy of the MV in a production system.

For now I know it is not commitable in actual state, but for my thesisit is enough and I know it will not be commitable with this design atall. In case of GSoC it will depends on the time I will be able to spendon it, if I will consider some other design.

Once there's a good way to merge updates, how to efficiently generatethem against the sort of large data sets that need materalizedviews--so you just write out the updates rather than a whole newcopy--is itself a large project with a significant quantity ofacademic research to absorb before starting. Dan Colish at PortlandState has been playing around with prototypes for the specific problemof finding a good algorithm for view refreshing that is compatiblewith PostgreSQL's execution model. He's already recognized the tablelock issue here and for the moment is ignoring that part. I don'thave a good feel yet for how long the targeted update code will taketo mature, but based on what I do know I suspect that little detail isalso a larger effort than the entire scope you're envisioning.There's a reason why the MIT Press compendium "Materialized Views:Techniques, Implementations, and Applications" is over 600 pageslong--I hope you've already started digging through that material.

I would like to start to dig through that, but I'm in a hurry now. Ialready have made a small research on MV as part of my thesis. I alsoplan to continue study PostgreSQL and Materialized Views more into thedepth, preferably as my master thesis. But I realize MV featurecommitable to PostgreSQL is not project for one person, of course.

Now, with all that said, that doesn't mean there's not a usefulproject for you buried in this mess. The first two steps in your plan:
1) create materialized view
2) change rewriter
Include building a prototype grammer, doing an initial executorimplementation, and getting some sort of rewriter working. That ispotentially good groundwork to lay here. I would suggest that youcompletely drop your step 3:
3) create command that takes snapshot (refresh MV)
Because you cannot built that in a way that will be useful (and bythat I mean committable quality) until there's a better way to handleupdates than writing a whole new table and grabbing a full relationlock to switch to it. To do a good job just on the first two stepsshould take at least a whole summer anyway--there's a whole stack ofbackground research needed I haven't seen anyone do yet, and thatisn't on your plan yet. There is a precedent for taking thisapproach. After getting stalled trying to add the entirety of easypartitioning to PostgreSQL, the current scope has been scaled back tojust trying to get the syntax and on-disk structure right, then finishoff the implementation. Seehttp://wiki.postgresql.org/wiki/Table_partitioning to get an idea howthat's been broken into those two major chunks.


Anyway thanks for all of your advices and help.

best regards,
       Pavel Baros




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL

Reply via email to