On Mon, May 31, 2021 at 04:16:52PM -0400, Stephen Frost wrote: > Greetings, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2021-05-27 17:00:23 -0400, Bruce Momjian wrote: > > > If you go in that direction, you should make sure pg_upgrade preserves > > > what you use (it does not preserve relfilenode, just pg_class.oid) > > > > Is there a reason for pg_upgrade not to maintain relfilenode, aside from > > implementation simplicity (which is a good reason!). The fact that the old > > and > > new clusters have different relfilenodes does make inspecting some things a > > bit harder. > > This was discussed for a bit during the Unconference (though it was > related to backups and major upgrades which involves replicas) and the > general consensus seemed to be that, no, it wasn't for any specific > reason beyond that pg_upgrade didn't need to preserve relfilenode and > therefore didn't.
Yes, David Steele wanted it so incremental backups after pg_upgrade were smaller, which makes sense. > There was a discussion around if there were possibly any pitfalls that > we might run into, should we try to have pg_upgrade preserve > relfilenodes but I don't *think* there were any actual show stoppers > that came up. The simplest approach, I would think, would be to have it > do the same thing that it does for OIDs today- basically have pg_dump in > binary mode emit a function call to inform the backend of what > relfilenode to use for the next CREATE statement. We would need to also > pass into that function if the table should have a TOAST table and what > the relfilenode for that should be too, for the base table. We'd need > to also handle indexes, mat views, etc, of course. Yes, exactly. The pg_upgrade.c paragraph says: * We control all assignments of pg_class.oid (and relfilenode) so toast * oids are the same between old and new clusters. This is important * because toast oids are stored as toast pointers in user tables. * * While pg_class.oid and pg_class.relfilenode are initially the same * in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM * FULL. In the new cluster, pg_class.oid and pg_class.relfilenode will * be the same and will match the old pg_class.oid value. Because of * this, old/new pg_class.relfilenode values will not match if CLUSTER, * REINDEX, or VACUUM FULL have been performed in the old cluster. One tricky case is pg_largeobject, which is copied from the old to new cluster since it has user data. To preserve that relfilenode, you would need to have pg_upgrade perform cluster surgery in each database to renumber its relfilenode to match since it is created by initdb. I can't think of a case where pg_upgrade already does something like that. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.