On Mon, Aug 2, 2021 at 6:38 PM Andres Freund <and...@anarazel.de> wrote: > I guess there's a somewhat hacky way to get somewhere without actually > increasing the size. We could take 3 bytes from the fork number and use that > to get to a 7 byte relfilenode portion. 7 bytes are probably enough for > everyone. > > It's not like we can use those bytes in a useful way, due to alignment > requirements. Declaring that the high 7 bytes are for the relNode portion and > the low byte for the fork would still allow efficient comparisons and doesn't > seem too ugly.
I think this idea is worth more consideration. It seems like 2^56 relfilenodes ought to be enough for anyone, recalling that you can only ever have 2^64 bytes of WAL. So if we do this, we can eliminate a bunch of code that is there to guard against relfilenodes being reused. In particular, we can remove the code that leaves a 0-length tombstone file around until the next checkpoint to guard against relfilenode reuse. On Windows, we still need https://commitfest.postgresql.org/36/2962/ because of the problem that Windows won't remove files from the directory listing until they are both unlinked and closed. But in general this seems like it would lead to cleaner code. For example, GetNewRelFileNode() needn't loop. If it allocate the smallest unsigned integer that the cluster (or database) has never previously assigned, the file should definitely not exist on disk, and if it does, an ERROR is appropriate, as the database is corrupted. This does assume that allocations from this new 56-bit relfilenode counter are properly WAL-logged. I think this would also solve a problem Dilip mentioned to me today: suppose you make ALTER DATABASE SET TABLESPACE WAL-logged, as he's been trying to do. Then suppose you do "ALTER DATABASE foo SET TABLESPACE used_recently_but_not_any_more". You might get an error complaining that “some relations of database \“%s\” are already in tablespace \“%s\“” because there could be tombstone files in that database. With this combination of changes, you could just use the barrier mechanism from https://commitfest.postgresql.org/36/2962/ to wait for those files to disappear, because they've got to be previously-unliked files that Windows is still returning because they're still opening -- or else they could be a sign of a corrupted database, but there are no other possibilities. I think, anyway. -- Robert Haas EDB: http://www.enterprisedb.com