On May 6, 2010, at 10:24 PM, Robert Haas wrote: > On Tue, May 4, 2010 at 3:03 PM, Alvaro Herrera > <alvhe...@commandprompt.com> wrote: >>>>> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary >>>>> relations? I think the only backend that can have an smgr reference >>>>> to a temprel other than the owning backend is bgwriter, and AFAICS >>>>> bgwriter will only have such a reference if it's responding to a >>>>> request by the owning backend to unlink the associated files, in which >>>>> case (I think) the owning backend will have no reference. > > This turns out to be wrong, I think. It seems that what we do is > prevent backends other than the opening backend from reading pages > from non-local temp rels into private or shared buffers, but we don't > actually prevent them from having smgr references. This allows > autovacuum to drop them, for example, in an anti-wraparound situation. > (Thanks to Tom for helping me get my head around this better.) > >>>> Hmm, wasn't there a proposal to have the owning backend delete the files >>>> instead of asking the bgwriter to? >>> >>> I did propose that upthread; it may have been proposed previously >>> also. This might be worth doing independently of the rest of the patch >>> (which I'm starting to fear is doomed, cue ominous soundtrack) since >>> it would reduce the chance of orphaning data files and possibly >>> simplify the logic also. >> >> +1 for doing it separately, but hopefully that doesn't mean the rest of >> this patch is doomed ... > > Doom has been averted. Proposed patch attached. It passes regression > tests and seems to work, but could use additional testing and, of > course, some code-reading also. > > Some notes on this patch: > > It seems prett clear that it isn't desirable to simply add backend ID > to RelFileNode, because there are too many places using RelFileNode > already for purposes where the backend ID can be inferred from > context, such as buffer headers and most of xlog. Instead, I > introduced BackendRelFileNode, which consists of an ordinary > RelFileNode augmented with a backend ID, and use that only where > needed. In particular, the smgr layer must use BackendRelFileNode > throughout, since it operates on both permanent and temporary > relations. smgr invalidations must also include the backend ID. xlog > generally happens only for non-temporary relations and can thus > continue to use an ordinary RelFileNode; however, commit/abort records > must use BackendRelFileNode as there may be physical storage > associated with temporary relations that must be unlinked. > Communication with the bgwriter must use BackendRelFileNode for > similar reasons. The relcache now stores rd_backend rather than > rd_islocaltemp so that it remains straightforward to call smgropen() > based on a relcache entry. Some smgr functions no longer require an > isTemp argument, because they can infer the necessary information from > their BackendRelFileNode. smgrwrite() and smgrextend() now take a > skipFsync argument rather than an isTemp argument. > > I'm not totally sure whether it makes sense to do what we were talking > about above, viz, transfer unlink responsibility for temp rels from > the bgwriter to the owning backend. I haven't done that here. Nor > have I implemented any kind of improved temporary file cleanup > strategy, though I hope such a thing is possible.
Any particular reason not to use directories to help organize things? IE: base/database_oid/pg_temp_rels/backend_pid/relfilenode Perhaps relfilenode should be something else. This seems to have several advantages: 1: It's more organized. If you want to see all the files for a single backend you have just one place to look. Finding everything is still easy via filesystem find. 2: Cleanup becomes easier. When a backend exits, it's entire directory goes away. On server start, everything under pg_temp_rels goes away. Unfortunately we still have a race condition with cleaning up if a backend dies and can't run it's own cleanup, though I think that anytime that happens we're going to restart everything anyway. 3: It separates all the temporary stuff away from real files. The only downside I see is some extra code to create the backend_pid directory. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers