On Fri, Mar 1, 2019 at 4:09 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.mu...@gmail.com> writes: > > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Thomas Munro <thomas.mu...@gmail.com> writes: > >>> Our current thinking is that smgropen() should know how to map a small > >>> number of special database OIDs to different smgr implementations > > >> Hmm. Maybe mapping based on tablespaces would be a better idea? > > > In the undo log proposal (about which more soon) we are using > > tablespaces for their real purpose, so we need that OID. If you SET > > undo_tablespaces = foo then future undo data created by your session > > will be written there, which might be useful for putting that IO on > > different storage. > > Meh. That's a point, but it doesn't exactly seem like a killer argument. > Just in the abstract, it seems much more likely to me that people would > want per-database special rels than per-tablespace special rels. And > I think your notion of a GUC that can control this is probably pie in > the sky anyway: if we can't afford to look into the catalogs to resolve > names at this code level, how are we going to handle a GUC?
I have this working like so: * undo logs have a small amount of meta-data in shared memory, stored in a file at checkpoint time, with all changes WAL logged, visible to users in pg_stat_undo_logs view * one of the properties of an undo log is its tablespace (the point here being that it's not in a catalog) * you don't need access to any catalogs to find the backing files for a RelFileNode (the path via tablespace symlinks is derivable from spcNode) * therefore you can find your way from an UndoLogRecPtr in (say) a zheap page to the relevant blocks on disk without any catalog access; this should work even in the apparently (but not actually) circular case of a pg_tablespace catalog that is stored in zheap (not something we can do right now, but hypothetically speaking), and has undo data that is stored in some non-default tablespace that must be consulted while scanning the catalog (not that I'm suggesting that would necessarily be a good idea to suppose catalogs in non-default tablespaces; I'm just addressing your theoretical point) * the GUC is used to resolve tablespace names to OIDs only by sessions that are writing, when selecting (or creating) an undo log to attach to and begin writing into; those sessions have no trouble reading the catalog to do so without problematic circularities, as above Seems to work; the main complications so far were coming up with reasonable behaviour and interlocking when you drop tablespaces that contain undo logs (short version: if they're not needed for snapshots or rollback, they are dropped, wasting the rest of their undo address space; otherwise they prevents the tablespace from being dropped with a clear message to that effect). It doesn't make any sense to put things like clog or any other SLRU in a non-default tablespace though. It's perfectly OK if not all smgr implementations know how to deal with tablespaces, and the SLRU support should just not support that. > The real reason I'm concerned about this, though, is that for either > a database or a tablespace, you can *not* get away with having a magic > OID just hanging in space with no actual catalog row matching it. > If nothing else, you need an entry there to prevent someone from > reusing the OID for another purpose. And a pg_database row that > doesn't correspond to a real database is going to break all kinds of > code, starting with pg_upgrade and the autovacuum launcher. Special > rows in pg_tablespace are much less likely to cause issues, because > of the precedent of pg_global and pg_default. GetNewObjectId() never returns values < FirstNormalObjectId. I don't think it's impossible for someone to want to put SMGRs in a catalog of some kind some day. Even though the ones for clog, undo etc would still probably need special hard-coded treatment as discussed, I suppose it's remotely possible that someone might some day figure out a useful way to allow extensions that provide different block storage (nvram? zfs zvols? encryption? (see Haribabu's reply)) but I don't have any specific ideas about that or feel inclined to design something for unknown future use. -- Thomas Munro https://enterprisedb.com