On Sat, Nov 13, 2010 at 9:17 PM, Greg Stark <gsst...@mit.edu> wrote: > On Sun, Nov 14, 2010 at 1:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Cleanup at first connection is something we've been avoiding for years, >> but maybe it's time to bite the bullet and do that? > > Another alternative is to initialize the unlogged tables when you > first access them. If you try to open a table and there are no files > attached them go ahead and initialize it by creating an empty table > and building any indexes.
I thought about that (I've thought about a lot of things in regards to this feature...). One problem is that you presumably will need to open the relation before you can decide whether this is the first access since restart. But by the time you've opened them, you've already taken an AccessShareLock, and you'll presumably need something a whole lot stronger than that to do the rebuild. Lock upgrades are usually a good thing to avoid when possible, although maybe it would be OK in this case, not sure. Another problem is that it's not too clear to me where you'd hook in the logic to do the cleanup. The relcache code seems like an awfully low-level place to be trying to perpetrate this sort of monkey business. > Hm, I had been assuming recovery would be responsible for cleaning up > the tables even if the first access is responsible for rebuilding > them. But there's a chance there have been no modifications to them > since the last checkpoint. But in that case the data in them is fine. > It would be a weird interface if it only cleared them out sometimes > based on unpredictable timing though. Avoiding that does require some > kind of alternate storage scheme other than the WAL to indicate what > needs to be cleared out. .init files are as good a mechanism even if > they just mean "unlink this file on startup". One idea I had was to trigger the rebuild when we notice that the main relation fork is missing. Then the startup code can just notice the init fork, annihilate everything else, and call it good. However, this appears to require modifying some fairly fundamental assumptions of the current system. smgr.c/md.c believe that nobody should ever try to read a nonexistent block, and unconditionally throw an error if the caller tries to do so. You could provide a mode where they don't do that, and instead return an error indication to the caller. Then you could add an additional ReadBuffer mode, say RBM_FAIL, to let the error percolate back up through that layer to the index AM or heap code, which could then try to upgrade its lock and recreate the main fork. However, I really couldn't work up much enthusiasm for implementing this feature in a way that requires drilling a hole in the abstraction stack from top to bottom. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers